Rcpp function to be SLOWER than same R function - c++

I have been coding a R function to compute an integral with respect to certain distributions, see code below.
EVofPsi = function(psi, probabilityMeasure, eps=0.01, ...){
distFun = function(u){
probabilityMeasure(u, ...)
}
xx = yy = seq(0,1,length=1/eps+1)
summand=0
for(i in 1:(length(xx)-1)){
for(j in 1:(length(yy)-1)){
signPlus = distFun(c(xx[i+1],yy[j+1]))+distFun(c(xx[i],yy[j]))
signMinus = distFun(c(xx[i+1],yy[j]))+distFun(c(xx[i],yy[j+1]))
summand = c(summand, psi(c(xx[i],yy[j]))*(signPlus-signMinus))
}
}
sum(summand)
}
It works fine, but it is pretty slow. It is common to hear that re-programming the function in a compiled language such as C++ would speed it up, especially because the R code above involves a double loop. So did I, using Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double EVofPsiCPP(Function distFun, Function psi, int n, double eps) {
NumericVector xx(n+1);
NumericVector yy(n+1);
xx[0] = 0;
yy[0] = 0;
// discretize [0,1]^2
for(int i = 1; i < n+1; i++) {
xx[i] = xx[i-1] + eps;
yy[i] = yy[i-1] + eps;
}
Function psiCPP(psi);
Function distFunCPP(distFun);
double signPlus;
double signMinus;
double summand = 0;
NumericVector topRight(2);
NumericVector bottomLeft(2);
NumericVector bottomRight(2);
NumericVector topLeft(2);
// compute the integral
for(int i=0; i<n; i++){
//printf("i:%d \n",i);
for(int j=0; j<n; j++){
//printf("j:%d \n",j);
topRight[0] = xx[i+1];
topRight[1] = yy[j+1];
bottomLeft[0] = xx[i];
bottomLeft[1] = yy[j];
bottomRight[0] = xx[i+1];
bottomRight[1] = yy[j];
topLeft[0] = xx[i];
topLeft[1] = yy[j+1];
signPlus = NumericVector(distFunCPP(topRight))[0] + NumericVector(distFunCPP(bottomLeft))[0];
signMinus = NumericVector(distFunCPP(bottomRight))[0] + NumericVector(distFunCPP(topLeft))[0];
summand = summand + NumericVector(psiCPP(bottomLeft))[0]*(signPlus-signMinus);
//printf("summand:%f \n",summand);
}
}
return summand;
}
I'm pretty happy since this C++ function works fine. However, when I tested both functions, the C++ one ran SLOWER:
sourceCpp("EVofPsiCPP.cpp")
pFGM = function(u,theta){
u[1]*u[2] + theta*u[1]*u[2]*(1-u[1])*(1-u[2])
}
psi = function(u){
u[1]*u[2]
}
print(system.time(
for(i in 1:10){
test = EVofPsi(psi, pFGM, 1/100, 0.2)
}
))
test
print(system.time(
for(i in 1:10){
test = EVofPsiCPP(psi, function(u){pFGM(u,0.2)}, 100, 1/100)
}
))
So, is there some kind expert around willing to explain me this? Did I code like a monkey and is there a way to speed up that function? Moreover, I would have a second question. Indeed, I could have replaced the output type double by SEXP, and the argument types Function by SEXP as well, it doesn't seem to change anything. So what is the difference?
Thank you very much in advance,
Gildas

Others have answered in comments already. So I'll just emphasize the point: Calling back to R functions is expensive as we need to be extra cautious about error handling. Just having the loop in C++ and call R functions is not rewriting your code in C++. Try rewriting psi and pFGM as C++ functions and report back here what happens.
You might argue that you lose some flexibility and you're not able anymore to use any R function. For situations like this, I'd advise to use some sort of hybrid solution where you have implemented the most common cases in C++ and fallback to an R solution otherwise.
As for the other question, a SEXP is an R object. This is part of the R API. It can be anything. When you create a Function from it (as is done implicitly for you when create a function that takes a Function argument), you are guaranteed that this is indeed an R function. The overhead is very small, but the gain in terms of expressiveness of your code is huge.

Related

Is there a way to 'reset' a functions variables?

I recently made a function to compare an array of numbers to a single value which returns the closest value to the single value out of the array. This works perfectly well when you only use it only once but if I use it again in another instance of the code, It returns an unexpected value (usually the previous single value used before). Here is the function that I am using:
double closestval (double num1, int amountofnums, double *comps){
double storagecomps[amountofnums];
for (int i = 0; i < amountofnums; i++){
storagecomps[i] = {comps[i]};
/* Storing the array of numbers for later as I will be changing them */
}
double smallval = 0.0001; /* tiny value used to increment/decrement
values in the array to the comparison variable.*/
int_fast64_t compi [amountofnums]; /* this variable keeps track of how many times it needs to decrement/increment the values in the array to approach the variable*/
for (int i = 0; i < amountofnums; i++){
compi[i] = 0;
}
for (int i = 0; i <= amountofnums; i++){
while (comps[i] > num1){
comps[i] -= smallval;
compi[i]++;
}
while (comps[i] < num1){
comps[i] += smallval;
compi[i]++;
}
double recholder[3] = {10000000, 0,};
// This area finds the
for (int i = 0; i < amountofnums; i++){
if (compi[i] < recholder[0]){
recholder[0] = compi[i];
recholder [1] = i;
recholder[2] = storagecomps[i]; /* if the amount of iterations to approach the single variable is less than the previous record holder, it becomes the new one.
*/
}
}
return recholder[2];
}
I am relatively sure this is because (in one way or another) the variables in the function are not being redefined properly or at all. Much thanks if you can show me where I've gone wrong!
The problem isn't resetting the variables. The problem is that you are modifying the arguments passed to the function.
To prevent modifications you should use the const keyword:
double closestval (double num1, int amountofnums, const double *comps){
and then fix the errors the compilers throws at you.
If you do want to modify the comps inside the functions but not have it affect the values outside the functions then you should usestd::vector so you can pass them by value and the compiler will copy them:
double closestval (double num1, int amountofnums, std::vector<double> comps){
You should really do that anyway as you should forget all about C-style arrays till you are an expert.

Unrestricted conversion from Array to TypedArray<std::complex<double>>?

Tried many things, just cannot get it to work when writing a mex-function.
I have an input from MATLAB which I pass to a method as const matlab::data::Array. This array may contain complex data, sometimes it's only real. So the most straightforward approach should be, in my naive thoughts, that I can simply convert the Array to a TypedArray<std::complex<double>> and I get full complex values if the array contains complex values, and I get complex values with imag=0 if the array contains only real values. It seems to be impossible... This last conversion is not accepted in any case, and MATLAB even simply crashes on trying to cast single elements from a real-valued Array to std::complex<double>.
Anybody a solution how to get a TypedArray<std::complex<double>> in all cases so I can use that in C++ code?
Story of my life, trying for hours and after posting here I find something that works within half an hour... Following code seems to do the job:
void prepareObject(const matlab::data::Array& corners, const matlab::data::Array& facets)
{
size_t N_facet_rows = facets.getDimensions()[0];
size_t N_facet_columns = facets.getDimensions()[1];
matlab::data::TypedArray<std::complex<double>> complex_facets = arrayFactory.createArray<std::complex<double>>(facets.getDimensions());
// Convert the facets to a complex-valued array.
if (facets.getType() == ArrayType::DOUBLE) {
std::complex<double> v;
// Input is DOUBLE, so for each value init a complex<double> and store that in the complex array.
v.imag(0);
for (int i_r = 0; i_r < N_facet_rows; i_r++) {
for (int i_c = 0; i_c < N_facet_columns; i_c++) {
v.real(facets[i_r][i_c]);
complex_facets[i_r][i_c] = v;
}
}
}
else {
// Input is COMPLEX_DOUBLE, so simply copy all values.
for (int i_r = 0; i_r < N_facet_rows; i_r++) {
for (int i_c = 0; i_c < N_facet_columns; i_c++) {
complex_facets[i_r][i_c] = (std::complex<double>) facets[i_r][i_c];
}
}
}

Rcpp: calling c++ function in R without exporting c++ function

I am trying to make a package with Rcpp. I have all of my C++ functions in a single .cpp file as follows:
double meanvec(NumericVector x) {
int n = x.size();
double tot = 0;
for (int i = 0; i < n; i++) {
tot += x[i];
}
tot /= n;
return tot;
}
double inprod(NumericVector u, NumericVector v) {
int m = u.size();
double val = 0;
for (int i = 0; i < m; i++) {
val += u[i] * v[i];
}
return val;
}
NumericVector lincoef(NumericVector x, NumericVector y) {
int n = x.size();
double xm = meanvec(x);
double ym = meanvec(y);
NumericVector xa(n);
for (int i = 0; i < n; i++) {
xa[i] = x[i] - xm;
}
NumericVector ya(n);
for (int i = 0; i < n; i++) {
ya[i] = y[i] - ym;
}
double b1 = inprod(xa, ya) / inprod(xa, xa);
double b0 = ym - (b1 * xm);
NumericVector beta = NumericVector::create(b0, b1);
return beta;
}
Basically, the last function takes two vectors as input and outputs a single vector. I would like to call this function into a separate .R file where I am trying to write another function. Something like this:
#' Title
#'
#' #param x Numeric vector.
#' #param y Numeric vector.
#'
#' #return
#' #export
linfit338 = function(x, y){
beta = .Call(`_pkg338_lincoef`, x, y)
fmod = function(x){
beta[1] + beta[2]*x
}
flist = list(beta, fmod)
return(flist)
}
Here the output is a list, where the first element is a vector from the C++ function being called and the second element is the created function. When I try to install and restart, I get this error message:
RcppExports.o:RcppExports.cpp:(.rdata+0x790): undefined reference to `_pkg338_lincoef'
My guess is that is has something to do with exporting the function. When I add // [[Rcpp::export]] above the lincoef function in the C++ file, I don't get any error message, and my final R function works. However, my whole goal is that I do not want the lincoef function exported at all.
Any way to fix this? I would also be open to suggestions as to how I can improve organizing these files, as this is my first experience building a package with Rcpp.
I think you're probably mixing up the concept of exporting C++ code to be used in R (via // [[Rcpp::export]]), which is entirely different to exporting R functions from your package, i.e. making those functions available to end-users of your package.
To make your Rcpp functions callable from within R at all, you need to // [[Rcpp::export]] them. If you don't do this, none of your C++ code will be available from within your R package.
It sounds like what you would like to do is to use the Rcpp-exported functions within your package but to hide them from end-users. This is a common use case for Rcpp, as it allows you to have an R function that acts as an end-user interface to your C++ code, while leaving you free to alter the C++ implementation in future developments without the risk of breaking existing users' code.
Any function you have created within your package, be it an R function or an Rcpp-exported function, has to actively be exported from your package to make it available to end-users. This is a different concept from // [[Rcpp::export]], which is needed to access C++ functions from within your package's R code.
Any R functions will only be exported from your R package if you specify them in the NAMESPACE file in your project's root directory. Thus to export myfunction() you need to have a line that says export(myfunction) in the NAMESPACE file. You are using roxygen2, which will generate this line automatically as long as you write #export in the roxygen skeleton. An alternative to using roxygen's exporting system is to specify an exportPattern in the NAMESPACE file that uses regex to export only functions whose names match a certain pattern.
My usual workflow is to prefix any Rcpp-exported functions with a period by writing my C++ functions like this:
// [[Rcpp::export(.MyCppFunction)]]
int BoringFunction() { return 0; }
I can now call the C++ function from R like this:
MyRFunction <- function()
{
result <- .MyCppFunction()
return(result)
}
The first line in my NAMESPACE file looks like this:
exportPattern("^[[:alpha:]]+")
Which means that any R function in my package starting with a letter will be exported. Since all the functions I Rcpp::export start with a period, I can use them internally within the R package but they won't be exported to end-users.
In other words, end-users of the package can call MyRFunction() but would get an error if they tried to call .MyCppFunction

Calling BFGS optimization in C++ from optim.c

So I have to rewrite my R code to C++. That is relatively easy, given the Rcpp package. I came across a problem while solving an optimization problem. In R I call:
optimum_optim = optim(par=A, fn=negative_LL, gr=negative_grad_LL, .c = c, .t = t, .i = i, .N = N, method = 'BFGS')
Given that I have rewritten the negative_LL and negative_grad_LL functions into my C++ file already I wanted to call the underlying routine for BFGS optimization from R: it is the vmmin function from optim.c
I have the problem that I cannot understand the signature of that function. It is:
vmmin(int n0, double *b, double *Fmin, optimfn fminfn, optimgr fmingr,
int maxit, int trace, int *mask,
double abstol, double reltol, int nREPORT, void *ex,
int *fncount, int *grcount, int *fail)
It's not that I did not put any effort into search - I just cannot find a description... Could someone please help call this function in my particular case (and tell me what the arguments are)?
It sounds like you've already been using this advice, but you need to go a bit deeper: "Use the source, Luke".
My starting point was to, from the R console, type simply
optim
This prints the R source code of that function. There I saw it calls
.External2(C_optim, par, fn1, gr1, method, con, lower, upper)
My favorite mirror for the R source code is this GitHub repo. If you head there, search for "optim", and filter only the C results, we'll go to the top hit, src/library/stats/src/optim.c. Then we can see how the C-level optim() (line 177) function calls vmmin() (line 295).
The way optim() initializes those arguments is as follows
int n length(par)
double *b vect(npar); dpar[i] = REAL(par)[i] / (OS->parscale[i])
double *Fmin 0.0
optimfn fn function defined in the C code
optimgr gr function defined in the C code
int maxit asInteger(getListElement(options, "maxit"))
int trace asInteger(getListElement(options, "trace"))
int *mask mask = (int *) R_alloc(npar, sizeof(int));
for (i = 0; i < npar; i++) mask[i] = 1;
double abstol asInteger(getListElement(options, "abstol"))
double reltol asInteger(getListElement(options, "reltol"))
int nREPORT asInteger(getListElement(options, "REPORT"));
void *ex OptStruct OS; /* tons of stuff done to this */
int *fncount 0
int *grcount 0
int *fail 0
I haven't put in all the details here, but I believe this should be enough to help you figure out how you need to use these things in your own function, once you find out about one other thing: the control list in optim(). If you notice in the .External2() call from above, there's an argument called con. This is defined in the R code as
con <- list(trace = 0, fnscale = 1, parscale = rep.int(1, npar),
ndeps = rep.int(1e-3, npar),
maxit = 100L, abstol = -Inf, reltol = sqrt(.Machine$double.eps),
alpha = 1.0, beta = 0.5, gamma = 2.0,
REPORT = 10, warn.1d.NelderMead = TRUE,
type = 1,
lmm = 5, factr = 1e7, pgtol = 0,
tmax = 10, temp = 10.0)
though these elements can be overridden by user input in the control argument, and if you check out help("optim"), you'll see
The ‘control’ argument is a list that can supply any of the following components:
‘trace’ ...
The C function refers to this list by the name options that you see referenced several times in the table I constructed above.

optimizing rcpp code

I started trying to use rcpp to improve the speed of a for loop in R where each iteration depends on the previous (i.e. no easy vectorization). My current code (below) is a bit faster than R but no nearly as fast as I would have thought. Any glaring inefficiencies in the code below that someone can spot? Any general (or specific) advice would be helpful.
UpdateInfections <- cxxfunction(signature(pop ="data.frame",inds="integer",alpha="numeric",t="numeric"), '
DataFrame DF(pop);
IntegerVector xinds(inds);
NumericVector inf_time = DF["inf.time"];
IntegerVector loc = DF["loc"] ;
IntegerVector Rind = DF["R.indiv"] ;
NumericVector infector = DF["infector"] ;
IntegerVector vac = DF["vac"] ;
NumericVector wts(loc.size());
double xt = Rcpp::as<double>(t);
double xalpha = Rcpp::as<double>(alpha);
RNGScope scope; // Initialize Random number generator
Environment base("package:base");
Function sample = base["sample"];
int n = loc.size();
int i;int j;int k;
int infsize = xinds.size();
for (i=0;i<infsize;i++) {
int infpoint = xinds[i]-1;
NumericVector inf_times_prop(Rind[infpoint]);
NumericVector inf_me(Rind[infpoint]);
for (j=0; j<n;j++){
if (j == infpoint){
wts[j] = 0.0;
} else if (loc[j] == loc[infpoint]){
wts[j] = 1.0;
} else {
wts[j] = xalpha;
}
}
inf_me = sample(n,Named("size",Rind[infpoint]),Named("prob",wts));
//Note that these will be shifted by one
for (k=0;k<Rind[infpoint];k++){
inf_times_prop[k] = floor(::Rf_rlnorm(1.6,.6) + 0.5 + xt);
if (inf_times_prop[k] < inf_time[inf_me[k]-1] && vac[inf_me[k]-1] == 0){
inf_time[inf_me[k]-1] = inf_times_prop[k];
infector[inf_me[k]-1] = inf_me[k];
}
}
}
// create a new data frame
Rcpp::DataFrame NDF =
Rcpp::DataFrame::create(Rcpp::Named("inf.time")=inf_time,
Rcpp::Named("loc")=loc,
Rcpp::Named("R.indiv")=Rind,
Rcpp::Named("infector")=infector,
Rcpp::Named("vac")=vac);
return(NDF);
' , plugin = "Rcpp" )
We're actually working on a pure C++ sample function for RcppArmadillo right now. Take a look here http://permalink.gmane.org/gmane.comp.lang.r.rcpp/4179 or here http://permalink.gmane.org/gmane.comp.lang.r.rcpp for updates.
You are calling back to R. That cannot be as fast a pure C++ solution.
Your example is also long, too long. I recommend profiling and optimizing individual pieces. There is, alas, still no entirely free lunch.