Intersect function with Rcpp - c++

I'm having a hard time implementing a function with the Rcpp module using cppFunction. I need to use something like R's intersect with two NumericVector types and return another NumericVector with the result, just like in R.
This document has been of some help but unfortunately I'm pretty much a noob in C++ atm.
How could I implement the intersect R function with cppFunction ?
Thanks

You would probably want to use something like the unordered_set to implement intersect:
File myintersect.cpp:
#include <Rcpp.h>
using namespace Rcpp;
// Enable C++11 via this plugin (Rcpp 0.10.3 or later)
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
NumericVector myintersect(NumericVector x, NumericVector y) {
std::vector<double> res;
std::unordered_set<double> s(y.begin(), y.end());
for (int i=0; i < x.size(); ++i) {
auto f = s.find(x[i]);
if (f != s.end()) {
res.push_back(x[i]);
s.erase(f);
}
}
return Rcpp::wrap(res);
}
We can load the function and verify it works:
library(Rcpp)
sourceCpp(file="myintersect.cpp")
set.seed(144)
x <- c(-1, -1, sample(seq(1000000), 10000, replace=T))
y <- c(-1, sample(seq(1000000), 10000, replace=T))
all.equal(intersect(x, y), myintersect(x, y))
# [1] TRUE
However, it seems this approach is a good deal less efficient than the itersect function:
library(microbenchmark)
microbenchmark(intersect(x, y), myintersect(x, y))
# Unit: microseconds
# expr min lq median uq max neval
# intersect(x, y) 424.167 495.861 501.919 523.7835 989.997 100
# myintersect(x, y) 1778.609 1798.111 1808.575 1835.1570 2571.426 100

Related

How to optimise Rcpp function (calling another R function)

I'm solving the following problem: Given two vectors x,y and a vectorised function f. I'd like to compute for each element x0 of x the average of f(x_0 - y).
I already implemented the function in R like this
sol <- function(x, y, f) {
ret <- numeric(length(x))
for (y0 in y) {
ret <- ret + f(x - y0)
}
ret/length(y)
}
We could use the function like this sol(1:100, 1:100, exp). Since this function is a crucial part of my code I'd like to optimise it. The length of x is in the range (1 - 100,000) and y is in the range (1 - 1,000). I tried using Rcpp like this
library(Rcpp)
cppFunction('NumericVector cppEval(NumericVector x, NumericVector y, Function f) {
int num_y = y.size();
NumericVector out(x.size());
for(int i = 0; i < num_y; ++i) {
out += Rcpp::as<NumericVector>(f(x - y[i]));
}
return out/num_y;
}')
Sadly this piece of code is much slower than the R equivalent. What could I do to efficiently write Cpp here? I don't know how to completely get rid of the loop.
microbenchmark::microbenchmark(sol(1:100, 1:100, exp), cppEval(1:100, 1:100, exp))
Unit: microseconds
expr min lq mean median uq max neval
sol(1:100, 1:100, exp) 157.572 178.336 244.4421 210.4775 221.7085 4199.367 100
cppEval(1:100, 1:100, exp) 1451.395 1628.367 1829.2443 1697.7480 1794.4390 12868.237 100

Armadillo - Norm of each small block in a long vector

I am using Armadillo in C++.
I have a long vector with 10 elements. I want to take norm 2 of each block of 2 adjacent values. In the end I will have 5 values.
In R I can convert that vector into a matrix and use apply but I am not sure how to do it in Armadillo. Appreciate any help
You just have to create a matrix from your vector and then loop through the columns.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec foo_Cpp(arma::vec x) {
// Note that the dimension of x must be divisible by two.
arma::mat X = arma::mat(x.memptr(), 2, x.n_elem/2);
arma::uword n = X.n_cols;
arma::vec norms = arma::vec(n);
for (arma::uword i = 0; i < n; i++) {
norms(i) = arma::norm(X.col(i), 2);
}
return norms;
}
/*** R
foo_R <- function(x) {
X <- matrix(x, 2, length(x)/2)
apply(X, 2, norm, type = "2")
}
x <- rnorm(1000)
all.equal(foo_R(x), c(foo_Cpp(x)))
microbenchmark::microbenchmark(foo_R(x), foo_Cpp(x))
*/
> all.equal(foo_R(x), c(foo_Cpp(x)))
[1] TRUE
> microbenchmark::microbenchmark(foo_R(x), foo_Cpp(x))
Unit: microseconds
expr min lq mean median uq max neval
foo_R(x) 17907.290 19640.24 21548.06789 20386.5815 21212.609 50780.584 100
foo_Cpp(x) 5.133 6.34 26.48266 19.4705 21.734 1191.124 100

Armadillo C++: Sorting a vector in terms of two other vectors

My question relates to a sorting exercise, which I can undertake easily (but perhaps slowly) in R and would like to undertake in C++ in order to speed up my code.
Consider three vectors of the same size a,b and c. In R, the following command would sort the vector first in terms of b and then, in case of ties, would further sort in terms of c.
a<-a[order(b,c),1]
Example:
a<-c(1,2,3,4,5)
b<-c(1,2,1,2,1)
c<-c(5,4,3,2,1)
> a[order(b,c)]
[1] 5 3 1 4 2
Is there an efficient way to undertake this in C++ using Armadillo vectors?
We can write the following C++ solution, which I have in a file SO_answer.cpp:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
// [[Rcpp::export]]
vec arma_sort(vec x, vec y, vec z) {
// Order the elements of x by sorting y and z;
// we order by y unless there's a tie, then order by z.
// First create a vector of indices
uvec idx = regspace<uvec>(0, x.size() - 1);
// Then sort that vector by the values of y and z
std::sort(idx.begin(), idx.end(), [&](int i, int j){
if ( y[i] == y[j] ) {
return z[i] < z[j];
}
return y[i] < y[j];
});
// And return x in that order
return x(idx);
}
What we've done is take advantage of the fact that std::sort() allows you to sort based on a custom comparator. We use a comparator that compares the elements of z only if the elements of y are equal; otherwise it compares the values of y.1 Then we can compile the file and test the function in R:
library(Rcpp)
sourceCpp("SO_answer.cpp")
set.seed(1234)
x <- sample(1:10)
y <- sample(1:10)
z <- sample(1:10)
y[sample(1:10, 1)] <- 1 # create a tie
all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good
Of course, we must also consider whether this actually gives you any performance increase, which is the whole reason why you're doing this. Let's benchmark:
library(microbenchmark)
microbenchmark(r = x[order(y, z)],
arma = arma_sort(x, y, z),
times = 1e4)
Unit: microseconds
expr min lq mean median uq max neval cld
r 36.040 37.23 39.386160 37.64 38.32 3316.286 10000 b
arma 5.055 6.07 7.155676 7.00 7.53 107.230 10000 a
On my machine, it looks like you get about a 5-6X increase in speed with small vectors, though this advantage doesn't hold as well when you scale up:
x <- sample(1:100)
y <- sample(1:100)
z <- sample(1:100)
y[sample(1:100, 10)] <- 1 # create some ties
all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good
microbenchmark(r = x[order(y, z)],
arma = arma_sort(x, y, z),
times = 1e4)
Unit: microseconds
expr min lq mean median uq max neval cld
r 44.50 46.360 48.01275 46.930 47.755 294.051 10000 b
arma 10.76 12.045 16.30033 13.015 13.715 5262.132 10000 a
x <- sample(1:1000)
y <- sample(1:1000)
z <- sample(1:1000)
y[sample(1:100, 10)] <- 1 # create some ties
all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good
microbenchmark(r = x[order(y, z)],
arma = arma_sort(x, y, z),
times = 1e4)
Unit: microseconds
expr min lq mean median uq max neval cld
r 113.765 118.7950 125.7387 120.5075 122.4475 3373.696 10000 b
arma 82.690 91.3925 104.0755 95.2350 99.4325 6040.162 10000 a
It's still faster, but by less than 2X once you're at vectors of length 1000. This is probably why F. Privé said this operation should be fast enough in R. While moving to C++ using Rcpp can give you great performance advantages, the extent to which you get gains is largely dependent on context, as mentioned many times by Dirk Eddelbuettel in answers to various questions here.
1 Note that typically for sorting Armadillo vectors I would suggest using sort() or sort_index() (see the Armadillo docs here). If you're trying to sort a vec by the values of a second vec, you could usex(arma::sort_index(y)) as I indicated in an answer to a related question here. You can even use stable_sort_index() to preserve ties. However, I couldn't figure out how to use these functions to solve the specific problem you present here.

Matrix multiplication in Rcpp

First of all, I am a novice user so forget my general ignorance. I am looking for a faster alternative to the %*% operator in R. Even though older posts suggest the use of RcppArmadillo, I have tried for 2 hours to make RcppArmadillo work without success. I always run into lexical issues that yield 'unexpected ...' errors. I have found the following function in Rcpp which I do can make work:
library(Rcpp)
func <- '
NumericMatrix mmult( NumericMatrix m , NumericMatrix v, bool byrow=true )
{
if( ! m.nrow() == v.nrow() ) stop("Non-conformable arrays") ;
if( ! m.ncol() == v.ncol() ) stop("Non-conformable arrays") ;
NumericMatrix out(m) ;
for (int i = 0; i < m.nrow(); i++)
{
for (int j = 0; j < m.ncol(); j++)
{
out(i,j)=m(i,j) * v(i,j) ;
}
}
return out ;
}
'
This function, however, performs element-wise multiplication and does not behave as %*%. Is there an easy way to modify the above code to achieve the intended result?
EDIT:
I have come up with a function using RcppEigen that seems to beat %*%:
etest <- cxxfunction(signature(tm="NumericMatrix",
tm2="NumericMatrix"),
plugin="RcppEigen",
body="
NumericMatrix tm22(tm2);
NumericMatrix tmm(tm);
const Eigen::Map<Eigen::MatrixXd> ttm(as<Eigen::Map<Eigen::MatrixXd> >(tmm));
const Eigen::Map<Eigen::MatrixXd> ttm2(as<Eigen::Map<Eigen::MatrixXd> >(tm22));
Eigen::MatrixXd prod = ttm*ttm2;
return(wrap(prod));
")
set.seed(123)
M1 <- matrix(sample(1e3),ncol=50)
M2 <- matrix(sample(1e3),nrow=50)
identical(etest(M1,M2), M1 %*% M2)
[1] TRUE
res <- microbenchmark(
+ etest(M1,M2),
+ M1 %*% M2,
+ times=10000L)
res
Unit: microseconds
expr min lq mean median uq max neval
etest(M1, M2) 5.709 6.61 7.414607 6.611 7.211 49.879 10000
M1 %*% M2 11.718 12.32 13.505272 12.621 13.221 58.592 10000
There are good reasons to rely on existing libraries / packages for standard tasks. The routines in the libraries are
optimized
thoroughly tested
a good means to keep the code compact, human-readable, and easy to maintain.
Therefore I think that using RcppArmadillo or RcppEigen should be preferable here. However, to answer your question, below is a possible Rcpp code to perform a matrix multiplication:
library(Rcpp)
cppFunction('NumericMatrix mmult(const NumericMatrix& m1, const NumericMatrix& m2){
if (m1.ncol() != m2.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(m1.nrow(),m2.ncol());
NumericVector rm1, cm2;
for (size_t i = 0; i < m1.nrow(); ++i) {
rm1 = m1(i,_);
for (size_t j = 0; j < m2.ncol(); ++j) {
cm2 = m2(_,j);
out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
}
}
return out;
}')
Let's test it:
A <- matrix(c(1:6),ncol=2)
B <- matrix(c(0:7),nrow=2)
mmult(A,B)
# [,1] [,2] [,3] [,4]
#[1,] 4 14 24 34
#[2,] 5 19 33 47
#[3,] 6 24 42 60
identical(mmult(A,B), A %*% B)
#[1] TRUE
Hope this helps.
As benchmark tests show, the above Rcpp code is slower than R's built-in %*% operator. I assume that, while my Rcpp code can certainly be improved, it will be hard to beat the optimized code behind %*% in terms of performance:
library(microbenchmark)
set.seed(123)
M1 <- matrix(rnorm(1e4),ncol=100)
M2 <- matrix(rnorm(1e4),nrow=100)
identical(M1 %*% M2, mmult(M1,M2))
#[1] TRUE
res <- microbenchmark(
mmult(M1,M2),
M1 %*% M2,
times=1000L)
#> res
#Unit: microseconds
# expr min lq mean median uq max neval cld
# mmult(M1, M2) 1466.855 1484.8535 1584.9509 1494.0655 1517.5105 2699.643 1000 b
# M1 %*% M2 602.053 617.9685 687.6863 621.4335 633.7675 2774.954 1000 a
I would encourage to try to work out your issues with RcppArmadillo. Using it is as simple as this example also created by calling RcppArmadillo.package.skeleton():
// another simple example: outer product of a vector,
// returning a matrix
//
// [[Rcpp::export]]
arma::mat rcpparma_outerproduct(const arma::colvec & x) {
arma::mat m = x * x.t();
return m;
}
// and the inner product returns a scalar
//
// [[Rcpp::export]]
double rcpparma_innerproduct(const arma::colvec & x) {
double v = arma::as_scalar(x.t() * x);
return v;
}
There is actually more code in the example but this should give you an idea.
The following approach can also be used :
NumericMatrix mmult(NumericMatrix m, NumericMatrix v)
{
Environment base("package:base");
Function mat_Mult = base["%*%"];
return(mat_Mult(m, v));
}
With this approach, we use the operator %*% of R.

How to speed up this Rcpp function?

I wish to implement a simple split-apply-combine routine in Rcpp where a dataset (matrix) is split up into groups, and then the groupwise column sums are returned. This is a procedure easily implemented in R, but often takes quite some time. I have managed to implement an Rcpp solution that beats the performance of R, but I wonder if I can further improve upon it. To illustrate, here some code, first for the use of R:
n <- 50000
k <- 50
set.seed(42)
X <- matrix(rnorm(n*k), nrow=n)
g=rep(1:8,length.out=n )
use.for <- function(mat, ind){
sums <- matrix(NA, nrow=length(unique(ind)), ncol=ncol(mat))
for(i in seq_along(unique(ind))){
sums[i,] <- colSums(mat[ind==i,])
}
return(sums)
}
use.apply <- function(mat, ind){
apply(mat,2, function(x) tapply(x, ind, sum))
}
use.dt <- function(mat, ind){ # based on Roland's answer
dt <- as.data.table(mat)
dt[, cvar := ind]
dt2 <- dt[,lapply(.SD, sum), by=cvar]
as.matrix(dt2[,cvar:=NULL])
}
It turns out that the for-loops is actually quite fast and is the easiest (for me) to implement with Rcpp. It works by creating a submatrix for each group and then calling colSums on the matrix. This is implemented using RcppArmadillo:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::mat use_arma(arma::mat X, arma::colvec G){
arma::colvec gr = arma::unique(G);
int gr_n = gr.n_rows;
int ncol = X.n_cols;
arma::mat out = zeros(gr_n, ncol);
for(int g=0; g<gr_n; g++){
int g_id = gr(g);
arma::uvec subvec = find(G==g_id);
arma::mat submat = X.rows(subvec);
arma::rowvec res = sum(submat,0);
out.row(g) = res;
}
return out;
}
However, based on answers to this question, I learned that creating copies is expensive in C++ (just as in R), but that loops are not as bad as they are in R. Since the arma-solution relies on creating matrixes (submat in the code) for each group, my guess is that avoiding this will speed up the process even further. Hence, here a second implementation based on Rcpp only using a loop:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix use_Rcpp(NumericMatrix X, IntegerVector G){
IntegerVector gr = unique(G);
std::sort(gr.begin(), gr.end());
int gr_n = gr.size();
int nrow = X.nrow(), ncol = X.ncol();
NumericMatrix out(gr_n, ncol);
for(int g=0; g<gr_n; g++){
int g_id = gr(g);
for (int j = 0; j < ncol; j++) {
double total = 0;
for (int i = 0; i < nrow; i++) {
if (G(i) != g_id) continue; // not sure how else to do this
total += X(i, j);
}
out(g,j) = total;
}
}
return out;
}
Benchmarking these solutions, including the use_dt version provided by #Roland (my previous version discriminted unfairly against data.table), as well as the dplyr-solution suggested by #beginneR, yields the following:
library(rbenchmark)
benchmark(use.for(X,g), use.apply(X,g), use.dt(X,g), use.dplyr(X,g), use_arma(X,g), use_Rcpp(X,g),
+ columns = c("test", "replications", "elapsed", "relative"), order = "relative", replications = 1000)
test replications elapsed relative
# 5 use_arma(X, g) 1000 29.65 1.000
# 4 use.dplyr(X, g) 1000 42.05 1.418
# 3 use.dt(X, g) 1000 56.94 1.920
# 1 use.for(X, g) 1000 60.97 2.056
# 6 use_Rcpp(X, g) 1000 113.96 3.844
# 2 use.apply(X, g) 1000 301.14 10.156
My intution (use_Rcpp better than use_arma) did not turn out right. Having said that, I guess that the line if (G(i) != g_id) continue; in my use_Rcpp function slows down everything. I am happy to learn about alternatives to set this up.
I am happy that I have achieved the same task in half the time it takes R to do it, but maybe the several Rcpp is much faster than R-examples have messed with my expectations, and I am wondering if I can speed this up even more. Does anyone have an idea? I also welcome any programming / coding comments in general since I am relatively new to Rcpp and C++.
No, it's not the for loop that you need to beat:
library(data.table)
#it doesn't seem fair to include calls to library in benchmarks
#you need to do that only once in your session after all
use.dt2 <- function(mat, ind){
dt <- as.data.table(mat)
dt[, cvar := ind]
dt2 <- dt[,lapply(.SD, sum), by=cvar]
as.matrix(dt2[,cvar:=NULL])
}
all.equal(use.dt(X,g), use.dt2(X,g))
#TRUE
benchmark(use.for(X,g), use.apply(X,g), use.dt(X,g), use.dt2(X,g),
columns = c("test", "replications", "elapsed", "relative"),
order = "relative", replications = 50)
# test replications elapsed relative
#4 use.dt2(X, g) 50 3.12 1.000
#1 use.for(X, g) 50 4.67 1.497
#3 use.dt(X, g) 50 7.53 2.413
#2 use.apply(X, g) 50 17.46 5.596
Maybe you're looking for (the oddly named) rowsum
library(microbenchmark)
use.rowsum = rowsum
and
> all.equal(use.for(X, g), use.rowsum(X, g), check.attributes=FALSE)
[1] TRUE
> microbenchmark(use.for(X, g), use.rowsum(X, g), times=5)
Unit: milliseconds
expr min lq median uq max neval
use.for(X, g) 126.92876 127.19027 127.51403 127.64082 128.06579 5
use.rowsum(X, g) 10.56727 10.93942 11.01106 11.38697 11.38918 5
Here's my critiques with in-line comments for your Rcpp solution.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix use_Rcpp(NumericMatrix X, IntegerVector G){
// Rcpp has a sort_unique() function, which combines the
// sort and unique steps into one, and is often faster than
// performing the operations separately. Try `sort_unique(G)`
IntegerVector gr = unique(G);
std::sort(gr.begin(), gr.end());
int gr_n = gr.size();
int nrow = X.nrow(), ncol = X.ncol();
// This constructor zero-initializes memory (kind of like
// making a copy). You should use:
//
// NumericMatrix out = no_init(gr_n, ncol)
//
// to ensure the memory is allocated, but not zeroed.
//
// EDIT: We don't have no_init for matrices right now, but you can hack
// around that with:
//
// NumericMatrix out(Rf_allocMatrix(REALSXP, gr_n, ncol));
NumericMatrix out(gr_n, ncol);
for(int g=0; g<gr_n; g++){
// subsetting with operator[] is cheaper, so use gr[g] when
// you can be sure bounds checks are not necessary
int g_id = gr(g);
for (int j = 0; j < ncol; j++) {
double total = 0;
for (int i = 0; i < nrow; i++) {
// similarily here
if (G(i) != g_id) continue; // not sure how else to do this
total += X(i, j);
}
// IIUC, you are filling the matrice row-wise. This is slower as
// R matrices are stored in column-major format, and so filling
// matrices column-wise will be faster.
out(g,j) = total;
}
}
return out;
}