Rcpp programming efficiency - c++

I am a beginner with Rcpp. Currently I wrote a Rcpp code, which was applied on two 3 dimensional arrays: Array1 and Array2. Suppose Array1 has dimension (1000, 100, 40) and Array2 has dimension (1000, 96, 40).
I would like to perform wilcox.test using:
wilcox.test(Array1[i, j,], Array2[i,,])
In R, I wrote nested for loops that completed the calculation in about a half hour.
Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language. I guess that my style of coding is the cause of the low efficient.
The following is my Rcpp code, would you mind helping me find out what improvement should I make please? I appreciate it!
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector Cal(NumericVector Array1,NumericVector Array2,Function wilc) {
NumericVector vecArray1(Array1);
IntegerVector arrayDims1 = vecArray1.attr("dim");
NumericVector vecArray2(Array2);
IntegerVector arrayDims2 = vecArray2.attr("dim");
arma::cube cubeArray1(vecArray1.begin(), arrayDims1[0], arrayDims1[1], arrayDims1[2], false);
arma::cube cubeArray2(vecArray2.begin(), arrayDims2[0], arrayDims2[1], arrayDims2[2], false);
arma::mat STORE=arma::mat(arrayDims1[0], arrayDims1[1]);
for(int i=0;i<arrayDims1[1];i++)
{
for(int j=0;j<arrayDims1[0];j++){
arma::vec v_cl=cubeArray1.subcube(arma::span(j),arma::span(i),arma::span::all);
//arma::mat tem=cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all);
//arma::vec v_ct=arma::vectorise(tem);
arma::vec v_ct=arma::vectorise(cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all));
Rcpp::List resu=wilc(v_cl,v_ct);
STORE(j,i)=resu[2];
}
}
return(Rcpp::wrap(STORE));
}
The function wilc will be wilcox.test from R.
The following is part of my R code for implementing the above idea, where CELLS and CTRLS are two 3D array in R.
for(i in 1:ncol(CELLS)) {
if(T){ print(i) }
for (j in 1:dim(CELLS)[1]) {
wtest = wilcox.test(CELLS[j,i,], CTRLS[j,,])
TSTAT_clcl[j,i] = wtest$p.value
}
}

Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language.
The required disclaimer:
Embedding R code in C++ and expecting a speed up is a fool's game. You will need to rewrite wilcox.test full in C++ instead of making a call to R. Otherwise, you lose whatever speed up advantage you get.
In particular, I wrote up a post illustrating this conundrum regarding the using the diff function in R. Within the post, I detailed comparing a pure C++ implementation, an C++ implementation using an R function within the routine, and a pure R implementation. Stealing the microbenchmark illustrates the above issue.
expr min lq mean median uq max neval
arma_fun 26.117 27.318 37.54248 28.218 29.869 751.087 100
r_fun 127.883 134.187 212.81091 138.390 151.148 1012.856 100
rcpp_fun 250.663 265.972 356.10870 274.228 293.590 1430.426 100
Thus, a pure C++ implementation had the largest speed up.
Hence, the take away is the need to translate the wilcox.test R routine code to a pure C++ implementation to drop the run time. Otherwise, it is meaningless to write the code in C++ because the C++ component must stop and await results from R before continuing. This traditionally has a lot of overhead to ensure the data is well protected.

Related

Sort an Array using C++ in R

I need to arrange a dataframe of prices, row by row in ascedent order.
But doing it on R for Loop is quite bad and slow.
A friend of mine tipped me to use Rcpp.
But I'm having quite a hard time to develop a looping in C++ that works.
#include <Rcpp.h>
// [[Rcpp::export]]
using namespace std;
List min(NumericVector x)
{
for (unsigned int i = 0; i < x.size(); i++) {
vector<int>& vec = x[i];
NumericVector Value sort(vec.begin(), vec.end());
}
Return Value;
}
I'm not used to C++ and i would like to know why it keeps saying that mys sort is wrong.
Arrange my dataframe by row.
Welcome (again) to StackOverflow and Rcpp! Two big worlds with much to discover...
sort() is available as a member function:
> Rcpp::cppFunction("NumericVector srt(NumericVector x) { return(x.sort()); }")
> srt(c(2,3,4,1.5,3.2))
[1] 1.5 2.0 3.0 3.2 4.0
>
Note that an advanced question is hidden inside this simple because the sort() member function sorts in place so the above mutates its input. That can be convenient ("hey, no new heap object to return") or confusing depending on your vantage point. We cover it in most Rcpp tutorials but you may have other more pressing issue. Keep on it!

Using R and Rcpp, how to multiply two matrices that are sparse Matrix::csr/csc format?

The following code works as expected:
matrix.cpp
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP eigenMatTrans(Eigen::MatrixXd A){
Eigen::MatrixXd C = A.transpose();
return Rcpp::wrap(C);
}
// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}
// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
Eigen::MatrixXd C = A * B;
return Rcpp::wrap(C);
}
This is using the C++ eigen class for matrices, See https://eigen.tuxfamily.org/dox
In R, I can access those functions.
library(Rcpp);
Rcpp::sourceCpp('matrix.cpp');
A <- matrix(rnorm(10000), 100, 100);
B <- matrix(rnorm(10000), 100, 100);
library(microbenchmark);
microbenchmark(eigenMatTrans(A), t(A), A%*%B, eigenMatMult(A, B), eigenMapMatMult(A, B))
This shows that R performs pretty well on resorting (transpose). Multiplying has some advantages with eigen.
Using the Matrix library, I can convert a normal matrix to a sparse matrix.
Example from https://cmdlinetips.com/2019/05/introduction-to-sparse-matrices-in-r/
library(Matrix);
data<- rnorm(1e6)
zero_index <- sample(1e6)[1:9e5]
data[zero_index] <- 0
A = matrix(data, ncol=1000)
A.csr = as(A, "dgRMatrix");
B.csr = t(A.csr);
A.csc = as(A, "dgCMatrix");
B.csc = t(A.csc);
So if I wanted to multiply A.csr times B.csr using eigen, how to do that in C++? I do not want to have to convert types if I don't have to. It is a memory size thing.
The A.csr %*% B.csr is not-yet-implemented.
The A.csc %*% B.csc is working.
I would like to microbenchmark the different options, and see how matrix size will be most efficient. In the end, I will have a matrix that is about 1% sparse and have 5 million rows and cols ...
There's a reason that dgRMatrix crossproduct functions are not yet implemented, in fact, they should not be implemented because otherwise they would enable bad practice.
There are a few performance considerations when working with sparse matrices:
Accessing marginal views against the major marginal orientation is highly inefficient. For instance, a column iterator in a dgRMatrix and a row iterator in a dgCMatrix need to loop through almost all elements of the matrix to find the ones in just that column or row. See this Rcpp gallery post for additional enlightenment.
A matrix cross-product is simply a dot product between all combinations of columns. This means the penalty of using a column iterator in a dgRMatrix (vs. a column iterator in a dgCMatrix) is multiplied by the number of column combinations.
Cross-product functions in R are highly optimized, and are not (in my experience) significantly faster than Eigen, Armadillo, equivalent STL variants. They are parallelized, and the Matrix package takes wonderful advantage of these optimized algorithms. I have written C++ parallelized STL cross-product variants using Rcpp structures and I don't see any increase in performance.
If you're really going this route, check out my Rcpp gallery post on Sparse Matrix structures in Rcpp. This is to be preferred to Eigen and Armadillo Sparse Matrices if memory is a concern, as Eigen and Armadillo perform a deep copy rather than a reference to an R object already existing in memory.
At 1% density, the inefficiencies of row iterators will be greater than at say 5 or 10% density. I do most of my tests at 5% density and generally binary operations take 5-10x longer for row iterators than for column iterators.
There may be applications where row-major ordering shines (i.e. see the work by Dmitry Selivanov on CSR matrices and irlba svd), but this is absolutely not one of them, in fact, so much so you are better off doing in-place conversion to get to a CSC matrix.
tl;dr: column-wise cross-product in row-major matrices is the ultimatum of inefficiency.

c++: passing Eigen-defined matrices to functions, and using them - best practice

I have a function which requires me to pass a fairly large matrix (which I created using Eigen) - and ranges from dimensions 200x200 -> 1000x1000. The function is more complex than this, but the bare bones of it are:
#include <Eigen/Dense>
int main()
{
MatrixXi mIndices = MatrixXi::Zero(1000,1000);
MatrixXi* pMatrix = &mIndices;
MatrixXi mTest;
for(int i = 0; i < 10000; i++)
{
mTest = pMatrix[0];
// Then do stuff to the copy
}
}
Is the reason that it takes much longer to run with a larger size of matrix because it takes longer to find the available space in RAM for the array when I set it equal to mTest? When I switch to a sparse array, this seems to be quite a lot quicker.
If I need to pass around large matrices, and I want to minimise the incremental effect of matrix size on runtime, then what is best practice here? At the moment, the same program is running slower in c++ than it is in Matlab, and obviously I would like to speed it up!
Best,
Ben
In the code you show, you are copying a 1,000,000 element 10,000 times. The assignment in the loop creates a copy.
Generally if you're passing an Eigen matrix to another function, it can be beneficial to accept the argument by reference.
It's not really clear from your code what you're trying to achieve however.

Can Rcpp replace unif function in R?

I have just started using the Rcpp package in R, my learning is inspired by the Advanced R course by Hadley Wickham.
Within R studio I have the following .cpp file. The question is more general but this example helps.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector runifC(int n, double min=0, double max=1) {
NumericVector out(n);
for(int i = 0; i < n; ++i) {
out[i] = min + ((double) rand() / (RAND_MAX)) * (max - min);
}
return out;
}
/*** R
library(microbenchmark)
microbenchmark(
'R unif-1' = runif(1),
'C++ unif-1' = runifC(1),
'R unif-100' = runif(100),
'C++ unif-100' = runifC(100),
'R unif-1000' = runif(1000),
'C++ unif-1000' = runifC(1000),
'R unif-100000' = runif(100000),
'C++ unif-100000' = runifC(100000)
)
*/
When I source/save the file it shows me the performance output.
Unit: nanoseconds
expr min lq mean median uq max neval
R unif-1 2061 2644.5 4000.71 3456.0 4297.0 15402 100
C++ unif-1 710 1190.0 1815.11 1685.0 2168.5 5776 100
R unif-100 4717 5566.5 6794.14 6563.0 7435.5 16600 100
C++ unif-100 1450 1997.5 2663.29 2591.5 3107.0 5307 100
R unif-1000 28210 29584.5 31310.54 30380.0 31599.0 52879 100
C++ unif-1000 8292 8951.0 10113.78 9462.5 10121.5 25099 100
R unif-100000 2642581 2975117.0 3104580.62 3030938.5 3119489.0 5435046 100
C++ unif-100000 699833 990924.0 1058855.49 1034430.5 1075078.0 1530351 100
I would expect that runif would be a very optimised function but the C++ code runs much more efficiently. I might be naive here, but if there is such a difference in performance then why aren't all applicable R functions rewritten in C++?
It seems so obvious that there are a lot of improvements possible that I feel as if I am missing a huge reason of why not all R functions can be blindly copied to C++ for performance.
edit: for this example it has been shown that the C++ implementation of rand() is slightly flawed. the performance gap that I noticed most used the rand() function. performance of other functions doesn't seem as drastic so i changed the name of the question.
Please DO NOT USE rand(). Doing so will kick your package off CRAN too should you submit it.
See eg this C++ reference page for a warning:
Notes
There are no guarantees as to the quality of the random sequence produced. In the past, some implementations of rand() have had serious shortcomings in the randomness, distribution and period of the sequence produced (in one well-known example, the low-order bit simply alternated between 1 and 0 between calls).
If you are interested in alternate random number generators and timing, the Rcpp Gallery.
In general, use the generators provided by R which are of excellent statistical quality, and offered in both scalar and vectorised form ("Rcpp Sugar") by Rcpp.
As of R-3.1.1, runif uses the .External interface, which copies its arguments. Luke Tierney changed this to use the .Call interface in R-devel in revision 66110. The .Call interface does not copy its arguments. Rcpp uses the .Call interface.
Your C++ code is still faster under R-devel (using the .Call interface). This is likely because of differences in the random number generator being used. Also, R's functions will generally have more checks than whatever specialized code you write; and those checks take time.

Find minimum of vector in Rcpp

Since last night I have been trying out Rcpp and inline, and so far I am really enjoying it. But I am kinda new to C in general and can only do basic stuff yet, and I am having a hard time finding help online on things like functions.
Something I was working on was a function that finds the minimum of a vector in the global environment. I came up with:
library("inline")
library("Rcpp")
foo <- rnorm(100)
bar <- cxxfunction( signature(),
'
Environment e = Environment::global_env();
NumericVector foo = e["foo"];
int min;
for (int i = 0; i < foo.size(); i++)
{
if ( foo[i] < foo[min] ) min = i;
}
return wrap(min+1);
', plugin = "Rcpp")
bar()
But it seems like there should be an easier way to do this, and it is quite slower than which.max()
system.time(replicate(100000,bar()))
user system elapsed
0.27 0.00 0.26
system.time(replicate(100000,which.min(foo)))
user system elapsed
0.2 0.0 0.2
Am I overlooking a basic c++ or Rcpp function that does this? And if so, where could I find a list of such functions?
I guess this question is related to:
Where can I learn how to write C code to speed up slow R functions?
but different in that I am not really interested in how to incorporate c++ in R, but more on how and where to learn basic c++ code that is usable in R.
Glad you are finding Rcpp useful.
The first comment by Billy is quite correct. There is overhead in the function lookup and there is overhead in the [] lookup for each element etc.
Also, a much more common approach is to take a vector you have in R, pass it to a compiled function you create via inline and Rcpp, and have it return the result. Try that. There are plenty of examples in the package and scattered over the rcpp-devel mailing list archives.
Edit: I could not resist trying to set up a very C++ / STL style answer.
R> src <- '
+ Rcpp::NumericVector x(xs);
+ Rcpp::NumericVector::iterator it = // iterator type
+ std::min_element(x.begin(), x.end()); // STL algo
+ return Rcpp::wrap(it - x.begin()); '
R> minfun <- cxxfunction(signature(xs="numeric"), body=src, plugin="Rcpp")
R> minfun(c(7:20, 3:5))
[1] 14
R>
That is not exactly the easiest answer but it shows how by using what C++ offers you can find a minimum element without an (explicit) loop even at the C++ level. But the builtin min() function is still faster.
*Edit 2: Corrected as per Romain's comment below.