Since last night I have been trying out Rcpp and inline, and so far I am really enjoying it. But I am kinda new to C in general and can only do basic stuff yet, and I am having a hard time finding help online on things like functions.
Something I was working on was a function that finds the minimum of a vector in the global environment. I came up with:
library("inline")
library("Rcpp")
foo <- rnorm(100)
bar <- cxxfunction( signature(),
'
Environment e = Environment::global_env();
NumericVector foo = e["foo"];
int min;
for (int i = 0; i < foo.size(); i++)
{
if ( foo[i] < foo[min] ) min = i;
}
return wrap(min+1);
', plugin = "Rcpp")
bar()
But it seems like there should be an easier way to do this, and it is quite slower than which.max()
system.time(replicate(100000,bar()))
user system elapsed
0.27 0.00 0.26
system.time(replicate(100000,which.min(foo)))
user system elapsed
0.2 0.0 0.2
Am I overlooking a basic c++ or Rcpp function that does this? And if so, where could I find a list of such functions?
I guess this question is related to:
Where can I learn how to write C code to speed up slow R functions?
but different in that I am not really interested in how to incorporate c++ in R, but more on how and where to learn basic c++ code that is usable in R.
Glad you are finding Rcpp useful.
The first comment by Billy is quite correct. There is overhead in the function lookup and there is overhead in the [] lookup for each element etc.
Also, a much more common approach is to take a vector you have in R, pass it to a compiled function you create via inline and Rcpp, and have it return the result. Try that. There are plenty of examples in the package and scattered over the rcpp-devel mailing list archives.
Edit: I could not resist trying to set up a very C++ / STL style answer.
R> src <- '
+ Rcpp::NumericVector x(xs);
+ Rcpp::NumericVector::iterator it = // iterator type
+ std::min_element(x.begin(), x.end()); // STL algo
+ return Rcpp::wrap(it - x.begin()); '
R> minfun <- cxxfunction(signature(xs="numeric"), body=src, plugin="Rcpp")
R> minfun(c(7:20, 3:5))
[1] 14
R>
That is not exactly the easiest answer but it shows how by using what C++ offers you can find a minimum element without an (explicit) loop even at the C++ level. But the builtin min() function is still faster.
*Edit 2: Corrected as per Romain's comment below.
Related
I need to arrange a dataframe of prices, row by row in ascedent order.
But doing it on R for Loop is quite bad and slow.
A friend of mine tipped me to use Rcpp.
But I'm having quite a hard time to develop a looping in C++ that works.
#include <Rcpp.h>
// [[Rcpp::export]]
using namespace std;
List min(NumericVector x)
{
for (unsigned int i = 0; i < x.size(); i++) {
vector<int>& vec = x[i];
NumericVector Value sort(vec.begin(), vec.end());
}
Return Value;
}
I'm not used to C++ and i would like to know why it keeps saying that mys sort is wrong.
Arrange my dataframe by row.
Welcome (again) to StackOverflow and Rcpp! Two big worlds with much to discover...
sort() is available as a member function:
> Rcpp::cppFunction("NumericVector srt(NumericVector x) { return(x.sort()); }")
> srt(c(2,3,4,1.5,3.2))
[1] 1.5 2.0 3.0 3.2 4.0
>
Note that an advanced question is hidden inside this simple because the sort() member function sorts in place so the above mutates its input. That can be convenient ("hey, no new heap object to return") or confusing depending on your vantage point. We cover it in most Rcpp tutorials but you may have other more pressing issue. Keep on it!
I'm trying to write a function which can take in functions as its arguments in Rcpp. I have written an example function in R that shows the kind of functionality that I'm aiming for:
simulate_and_evaluate <- function(simulate, evaluate) {
y <- simulate(1)
eval <- evaluate(y)
return(eval)
}
simulate_fun <- function(n) rnorm(n, 0, 1)
evaluate_fun <- function(x) dnorm(x, 0, 1)
simulate_and_evaluate(simulate = simulate_fun,
evaluate = evaluate_fun)
In this function simulate_and_evaluate, this takes in two arguments which are both functions, one that simulates a number and one that evaluates a function with this simualted number. So as an example, we can simulate a value from a standard normal and evaluate the density of a standard normal at that point. Does anyone know if there's a way to do this in Rcpp?
Rcpp aims for seamless interfacing of R and C++ objects. As functions are first class R objects represented internally as a type a SEXP can take, we can of course also ship them with Rcpp. There are numerous examples.
So here we simply rewrite your function as a C++ function:
Rcpp::cppFunction("double simAndEval(Function sim, Function eval) {
double y = as<double>(sim(1));
double ev = as<double>(eval(y));
return(ev);
}")
And we can then set the RNG to the same value, run your R function and this C++ function and get the same value. Which is awesome.
R> set.seed(123)
R> simulate_and_evaluate(simulate = simulate_fun,
+ evaluate = evaluate_fun)
[1] 0.341
R> set.seed(123) # reset RNG
R> simAndEval(simulate_fun, evaluate_fun)
[1] 0.341
R>
But as #MrFlick warned you, this will not run any faster because we added no compiled execution of the actual functions we are merely calling them from C++ rathern than R.
The topic has been discussed before. Please search StackOverflow, maybe with a string [rcpp] Function to get some meaningful hits.
I am a beginner with Rcpp. Currently I wrote a Rcpp code, which was applied on two 3 dimensional arrays: Array1 and Array2. Suppose Array1 has dimension (1000, 100, 40) and Array2 has dimension (1000, 96, 40).
I would like to perform wilcox.test using:
wilcox.test(Array1[i, j,], Array2[i,,])
In R, I wrote nested for loops that completed the calculation in about a half hour.
Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language. I guess that my style of coding is the cause of the low efficient.
The following is my Rcpp code, would you mind helping me find out what improvement should I make please? I appreciate it!
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector Cal(NumericVector Array1,NumericVector Array2,Function wilc) {
NumericVector vecArray1(Array1);
IntegerVector arrayDims1 = vecArray1.attr("dim");
NumericVector vecArray2(Array2);
IntegerVector arrayDims2 = vecArray2.attr("dim");
arma::cube cubeArray1(vecArray1.begin(), arrayDims1[0], arrayDims1[1], arrayDims1[2], false);
arma::cube cubeArray2(vecArray2.begin(), arrayDims2[0], arrayDims2[1], arrayDims2[2], false);
arma::mat STORE=arma::mat(arrayDims1[0], arrayDims1[1]);
for(int i=0;i<arrayDims1[1];i++)
{
for(int j=0;j<arrayDims1[0];j++){
arma::vec v_cl=cubeArray1.subcube(arma::span(j),arma::span(i),arma::span::all);
//arma::mat tem=cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all);
//arma::vec v_ct=arma::vectorise(tem);
arma::vec v_ct=arma::vectorise(cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all));
Rcpp::List resu=wilc(v_cl,v_ct);
STORE(j,i)=resu[2];
}
}
return(Rcpp::wrap(STORE));
}
The function wilc will be wilcox.test from R.
The following is part of my R code for implementing the above idea, where CELLS and CTRLS are two 3D array in R.
for(i in 1:ncol(CELLS)) {
if(T){ print(i) }
for (j in 1:dim(CELLS)[1]) {
wtest = wilcox.test(CELLS[j,i,], CTRLS[j,,])
TSTAT_clcl[j,i] = wtest$p.value
}
}
Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language.
The required disclaimer:
Embedding R code in C++ and expecting a speed up is a fool's game. You will need to rewrite wilcox.test full in C++ instead of making a call to R. Otherwise, you lose whatever speed up advantage you get.
In particular, I wrote up a post illustrating this conundrum regarding the using the diff function in R. Within the post, I detailed comparing a pure C++ implementation, an C++ implementation using an R function within the routine, and a pure R implementation. Stealing the microbenchmark illustrates the above issue.
expr min lq mean median uq max neval
arma_fun 26.117 27.318 37.54248 28.218 29.869 751.087 100
r_fun 127.883 134.187 212.81091 138.390 151.148 1012.856 100
rcpp_fun 250.663 265.972 356.10870 274.228 293.590 1430.426 100
Thus, a pure C++ implementation had the largest speed up.
Hence, the take away is the need to translate the wilcox.test R routine code to a pure C++ implementation to drop the run time. Otherwise, it is meaningless to write the code in C++ because the C++ component must stop and await results from R before continuing. This traditionally has a lot of overhead to ensure the data is well protected.
I have just started using the Rcpp package in R, my learning is inspired by the Advanced R course by Hadley Wickham.
Within R studio I have the following .cpp file. The question is more general but this example helps.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector runifC(int n, double min=0, double max=1) {
NumericVector out(n);
for(int i = 0; i < n; ++i) {
out[i] = min + ((double) rand() / (RAND_MAX)) * (max - min);
}
return out;
}
/*** R
library(microbenchmark)
microbenchmark(
'R unif-1' = runif(1),
'C++ unif-1' = runifC(1),
'R unif-100' = runif(100),
'C++ unif-100' = runifC(100),
'R unif-1000' = runif(1000),
'C++ unif-1000' = runifC(1000),
'R unif-100000' = runif(100000),
'C++ unif-100000' = runifC(100000)
)
*/
When I source/save the file it shows me the performance output.
Unit: nanoseconds
expr min lq mean median uq max neval
R unif-1 2061 2644.5 4000.71 3456.0 4297.0 15402 100
C++ unif-1 710 1190.0 1815.11 1685.0 2168.5 5776 100
R unif-100 4717 5566.5 6794.14 6563.0 7435.5 16600 100
C++ unif-100 1450 1997.5 2663.29 2591.5 3107.0 5307 100
R unif-1000 28210 29584.5 31310.54 30380.0 31599.0 52879 100
C++ unif-1000 8292 8951.0 10113.78 9462.5 10121.5 25099 100
R unif-100000 2642581 2975117.0 3104580.62 3030938.5 3119489.0 5435046 100
C++ unif-100000 699833 990924.0 1058855.49 1034430.5 1075078.0 1530351 100
I would expect that runif would be a very optimised function but the C++ code runs much more efficiently. I might be naive here, but if there is such a difference in performance then why aren't all applicable R functions rewritten in C++?
It seems so obvious that there are a lot of improvements possible that I feel as if I am missing a huge reason of why not all R functions can be blindly copied to C++ for performance.
edit: for this example it has been shown that the C++ implementation of rand() is slightly flawed. the performance gap that I noticed most used the rand() function. performance of other functions doesn't seem as drastic so i changed the name of the question.
Please DO NOT USE rand(). Doing so will kick your package off CRAN too should you submit it.
See eg this C++ reference page for a warning:
Notes
There are no guarantees as to the quality of the random sequence produced. In the past, some implementations of rand() have had serious shortcomings in the randomness, distribution and period of the sequence produced (in one well-known example, the low-order bit simply alternated between 1 and 0 between calls).
If you are interested in alternate random number generators and timing, the Rcpp Gallery.
In general, use the generators provided by R which are of excellent statistical quality, and offered in both scalar and vectorised form ("Rcpp Sugar") by Rcpp.
As of R-3.1.1, runif uses the .External interface, which copies its arguments. Luke Tierney changed this to use the .Call interface in R-devel in revision 66110. The .Call interface does not copy its arguments. Rcpp uses the .Call interface.
Your C++ code is still faster under R-devel (using the .Call interface). This is likely because of differences in the random number generator being used. Also, R's functions will generally have more checks than whatever specialized code you write; and those checks take time.
I mainly use R, but eventually would like to use Rcpp to interface with some C++ functions that take in and return 2d numeric arrays. So to start out playing around with C++ and Rcpp, I thought I'd just make a little function that converts my R list of variable-length numeric vectors to the C++ equivalent and back again.
require(inline)
require(Rcpp)
test1 = cxxfunction(signature(x='List'), body =
'
using namespace std;
List xlist(x);
int xlen = xlist.size();
vector< vector<int> > xx;
for(int i=0; i<xlen; i++) {
vector<int> test = as<vector<int> > (xlist[i]);
xx.push_back(test);
}
return(wrap(xx));
'
, plugin='Rcpp')
This works like I expect:
> test1(list(1:2, 4:6))
[[1]]
[1] 1 2
[[2]]
[1] 4 5 6
Admittedly I am only part way through the very thorough documentation, but is there a nicer (i.e. more Rcpp-like) way to do the R -> C++ conversion than with the for loop? I am thinking possibly not, since the documentation mentions that (at least with the built-in methods) as "offers less flexibility and currently handles conversion of R objects into primitive types", but I wanted to check because I'm very much a novice in this area.
I will give you bonus points for a reproducible example, and of course for using Rcpp :) And then I will take those away for not asking on the rcpp-devel list...
As for converting STL types: you don't have to, but when you decide to do it, the as<>() idiom is correct. The only 'better way' I can think of is to do name lookup as you would in R itself:
require(inline)
require(Rcpp)
set.seed(42)
xl <- list(U=runif(4), N=rnorm(4), T2df=rt(4,2))
fun <- cxxfunction(signature(x="list"), plugin="Rcpp", body = '
Rcpp::List xl(x);
std::vector<double> u = Rcpp::as<std::vector<double> >(xl["U"]);
std::vector<double> n = Rcpp::as<std::vector<double> >(xl["N"]);
std::vector<double> t2 = Rcpp::as<std::vector<double> >(xl["T2df"]);
// do something clever here
return(R_NilValue);
')
Hope that helps. Otherwise, the list is always open...
PS As for the two-dim array, that is trickier as there is no native C++ two-dim array. If you actually want to do linear algebra, look at RcppArmadillo and RcppEigen.