I want to improve the speed of some of my R code using Rcpp. However, my knowledge of C++ is very little. So, I checked the documentation provided with Rcpp, and other documents provided at Dirk Eddelbuttel’s site. After reading all the stuff, I tried to execute a simple loop that I wrote in R. unfortunately, I was unable to do it. Here is the R function:
Inverse Wishart
beta = matrix(rnorm(15),ncol=3)
a = rnorm(3)
InW = function(beta,a) {
n = nrow(beta)
p = ncol(beta)
I = diag(rep(1,times = p))
H = matrix(0,nrow=p,ncol=p)
for(i in 1:n){
subBi = beta[i,]
H = H + tcrossprod(a - subBi)
}
H = H + p * I
T = t(chol(chol2inv(chol(H))))
S = 0
for(i in 1:(n+p)){
u <- rnorm(p)
S = S + tcrossprod(T %*% u)
}
D = chol2inv(chol((S)))
ans = list(Dinv = S,D=D)
}
I truly, appreciate if someone can help me as it will serve as starting point in learning Rcpp.
A basic example of RcppArmadillo goes like this,
require(RcppArmadillo)
require(inline)
code <- '
arma::mat beta = Rcpp::as<arma::mat>(beta_);
int n = beta.n_rows; int p = beta.n_cols;
arma::mat Ip = arma::eye<arma::mat>( p, p );
int ii;
double S=0;
for (ii=0; ii<(n+p); ii++) {
S += ii; // dummy calculation
}
return Rcpp::wrap(S);
'
fun <- cxxfunction(signature(beta_ ="matrix"),
code, plugin="RcppArmadillo")
m <- matrix(1:9,3)
fun(m)
and you can browse armadillo's doc to find the more advanced bits and pieces.
an answer to my first question is shown below. It may be not the efficient way but the Rcpp code gives the same results as the R code. I appreciate the help from baptiste.
code <- '<br/>
arma::mat beta = Rcpp::as<arma::mat>(beta_);
arma::rowvec y = Rcpp::as<arma::rowvec>(y_);
int n = beta.n_rows; int p = beta.n_cols;
arma::mat Ip = arma::eye<arma::mat>( p, p );
int ii;
arma::mat H1 = beta, d;
arma::mat H2=H1.zeros(p,p);
arma::rowvec S;
for (ii=0;ii<n;ii++){
S= beta.row(ii);
d = trans(y - S)*(y-S);
H2 = H2 + d ;
}
arma::mat H = chol(H2+p*Ip);
arma::mat Q , R;
qr(Q,R,H);
arma::mat RR = R;
arma::mat TT = trans(chol(solve(trans(RR)*RR,Ip)));
int jj;
arma::mat SS = H1.zeros(p,p);
arma::colvec u;
arma::colvec V;
for(jj=0;jj<(n+p);jj++) {
u = rnorm(p);
V = TT*u;
SS = SS + V * trans(V);
}
arma::mat SS1 = chol(SS);
arma::mat Q1 , R1;
qr(Q1,R1,SS1);
arma::mat SS2 = R1;
arma::mat D = solve(trans(SS2)*SS2,Ip);
return Rcpp::List::create(Rcpp::Named("Dinv")=SS,Rcpp::Named("D")=D);
'
fun = cxxfunction(signature(beta_ ="matrix",y_="numeric"),code, plugin="RcppArmadillo")
m = matrix(rnorm(100),ncol=5)
vec = rnorm(5)
fun(m,vec)
Related
I am new to Rcpp and am looking to make the following code as fast as possible. Any tips would be greatly appreciated.
It is running a lot slower than I would like.
I have tried vectorising as much as possible but i'm not sure how to vectorise further
// [[Rcpp::export]]
arma::mat fn_update_log_posterior(arma::mat current_logdiffs_set_mat,
int K,
int N,
int l_class,
int ord_test,
int n_bin_tests,
arma::mat first_cutpoint_mat,
arma::mat prev,
arma::cube prob_test,
arma::mat prob,
arma::vec class_ind,
arma::cube Xbeta,
double prior_ind_dir,
arma::mat yp1,
arma::mat y,
double prior_densities) {
arma::mat current_cutpoints_set_full = fn_calculate_cutpoints(current_logdiffs_set_mat, first_cutpoint_mat, K);
arma::mat log_prob(1,1);
arma::vec lp(N);
arma::vec lower_ord_inv_prob(N);
arma::vec upper_ord_inv_prob(N);
arma::vec lower_ord_prob(N);
arma::vec upper_ord_prob(N);
arma::mat prob_test_n(prob_test.n_cols, prob_test.n_slices);
for (int n = 0; n < N;++ n) {
upper_ord_inv_prob.at(n) = current_cutpoints_set_full.at(yp1.at(n, ord_test)) - Xbeta.at(l_class , n , ord_test);
lower_ord_inv_prob.at(n) = current_cutpoints_set_full.at(y.at(n, ord_test)) - Xbeta.at(l_class, n , ord_test);
}
upper_ord_prob = fn_Phi_approx_vec_2(upper_ord_inv_prob);
lower_ord_prob = fn_Phi_approx_vec_2(lower_ord_inv_prob);
for (int n = 0; n < N; ++n) {
prob_test.at(n, ord_test, l_class) = log( upper_ord_prob(n) - lower_ord_prob(n) );
prob_test_n = prob_test.row(n);
prob.at(n,l_class) = sum(prob_test_n.col(l_class)) + log(prev.at(0, l_class));
lp.at(n) = prob.at(n,class_ind(n) - 1); // works for any number of classes
}
log_prob(0,0) = sum(lp) + prior_densities;
return(log_prob);
}
I am working on speeding up a program I wrote in R. The code involves repeatedly computing LogSumExp over multidimensional arrays, i.e computing s_lnj = exp(u_lnj) / (1 + sum_k exp(u_lnk)). The base R version of the code I am trying to increase the speed of is the following:
log_sum_exp_func <- function(vec){
max_vec <- max(vec)
return(max_vec + log(sum(exp(vec-max_vec))))
}
compute_share_from_utils_func <- function(u_lnj){
### get dimensions
L <- dim(u_lnj)[1]; n_poly <- dim(u_lnj)[2]; J <- dim(u_lnj)[3]
### compute denominator of share, 1 + sum exp utils
den_ln <- 1 + exp(apply(u_lnj, c(1,2), log_sum_exp_func))
den_lnj <- array(rep(den_ln, J), dim = c(L, n_poly, J))
### take ratio of utils and denominator
s_lnj <- exp(u_lnj) / den_lnj
return(s_lnj)
}
I tried to use xtensor and Rcpp to speed things up, but ran into several issues. The Rcpp code I wrote is the following
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::plugins(cpp14)]]
#include <numeric> // Standard library import for std::accumulate
#define STRICT_R_HEADERS // Otherwise a PI macro is defined in R
#include "xtensor/xmath.hpp" // xtensor import for the C++ universal functions
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xview.hpp"
#include "xtensor-r/rarray.hpp" // R bindings
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double cxxlog_sum_exp_vec(xt::rarray<double>& m)
{
auto shape_m = m.shape();
double maxvec = xt::amax(m)[0];
xt::rarray<double> arr_maxvec = maxvec * xt::ones<double>(shape_m);
xt::rarray<double> vec_min_max = m - arr_maxvec;
xt::rarray<double> exp_vec_min_max = xt::exp(vec_min_max);
double sum_exp = xt::sum(exp_vec_min_max)[0];
double log_sum_exp = std::log(sum_exp);
return log_sum_exp + maxvec;
}
// [[Rcpp::export]]
xt::rarray<double> cxxshare_from_utils(xt::rarray<double>& u_lnj)
{
int L = u_lnj.shape(0);
int N = u_lnj.shape(1);
int J = u_lnj.shape(2);
xt::rarray<double> res = xt::ones<double>({L,N,J});
for (std::size_t l = 0; l < u_lnj.shape()[0]; ++l)
{
for (std::size_t n = 0; n < u_lnj.shape()[1]; ++n)
{
xt::rarray<double> utils_j = xt::view(u_lnj, l, n, xt::all());
double inv_lse = 1 / (1 + std::exp(cxxlog_sum_exp_vec(utils_j)));
for (std::size_t j = 0; j < J; ++j)
{
res(l, n, j) = std::exp(u_lnj(l, n, j)) * inv_lse;
}
}
}
return res;
}
The Rcpp implementation does seem to yield the same results as the base R code, however it seems to encounter problems whenever the dimensions of the input array increase. My R Session fails if I run
L <- 100
n <- 100
J <- 200
u_lnj <- array(rnorm(L*n*J,0,2), dim = c(L, n, J))
test <- cxxshare_from_utils(u_lnj)
But the code runs fine for L, n, J = 10,10,20 for instance. Moreover, the C++ implementation of log_sum_exp does not seem to outperform the base R version that much.
EDIT: I could not figure out what was the issue with the way I am using xtensor. But I did get some speed up with the following RcppArmadillo code. The drawback of this version is that is likely not as robust to overflow as the base R function relying on Log Sum Exp.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::export]]
arma::cube cxxarma_share_from_utils(arma::cube u_lnj) {
// Extract the different dimensions
// Normal Matrix dimensions
unsigned int L = u_lnj.n_rows;
unsigned int N = u_lnj.n_cols;
// Depth of Array
unsigned int J = u_lnj.n_slices;
//resulting cube
arma::cube s_lnj = arma::exp(u_lnj);
for (unsigned int l = 0; l < L; l++) {
for (unsigned int n = 0; n < N; n++) {
double den = 1 / (1 + arma::accu(s_lnj.subcube(arma::span(l), arma::span(n), arma::span())));
for (unsigned int j = 0; j < J; j++) {
s_lnj(l, n, j) = s_lnj(l, n, j) * den;
}
}
}
return s_lnj;
}
I am a new user of Rcpp and I am writing an package.
I have defined two functions in one script and try to call one from another in the loop.
One of my function defined as below:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
double timesTwo(colvec x, NumericVector group, double k,
NumericVector unique_group)
{
vec beta(x.begin(),x.size(),false);
vec Group(group.begin(),group.size(),false);
vec unigroup(unique_group.begin(),unique_group.size(),false);
beta = pow(beta,k);
int g = unigroup.size();
int j = 0;
uvec st ;
double b=0;
for(j = 0; j < g; j++)
{
st = find(Group == unigroup[j]);
b = b + abs(pow(sum(beta.elem(st)),1/k));
}
double s = b;
return s;
}
And the loop I use to to call this function is like below:
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update
// best!
pen_new_positive(i) = as<double>(timesTwo(z,group,k,unique_group));
My question is just like the comment I said in the loop, since I want to update that pen_new_postive(i) based on the BETA_NEW.col(2*i) However when I directly put BETA_NEW.COL(2*i) inside the timesTwo function, no matter how I change input
type of function (colvec or mat or whatever) there is error like below:
cannot convert "const::arma::subview_col<double> to "SEXP" in
initialization"
However when I directly use z in the timesTwo function, there is no update for my z in the loop.
Anyone could give me a hint about how to deal with this?
The full version of my code in second block as below:
#include <RcppArmadillo.h>
#include <math.h>
//#include <omp.h>
using namespace Rcpp;
using namespace arma;
//// [[Rcpp::plugins(openmp)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List minghan21041(NumericMatrix beta_new, NumericVector diff_loss,
NumericVector beta, double step_size, NumericVector y,
double k,
NumericVector group,
NumericVector unique_group,
NumericMatrix X) {
int n = X.nrow(), p=X.ncol() , n1=beta_new.nrow(), p1=beta_new.ncol();
mat X_MAT(X.begin(),n,p,false), BETA_NEW(beta_new.begin(),n1,p1,false);
vec BETA(beta.begin(),beta.size(),false);
vec Y(y.begin(),y.size(),false),
Diff_Loss(diff_loss.begin(),diff_loss.size(),false), iter(p,fill::zeros);
mat XtXi_beta_plus(n,p,fill::zeros);
mat XtXi_beta_minus(n,p,fill::zeros);
vec ited(n,fill::ones);
mat loss_new_1(n,p, fill::zeros),loss_new_2(n,p, fill::zeros);
colvec loss_new_3(n,fill::zeros);
vec new_loss(p1,fill::zeros);
uword index;
vec beta_final(p,fill::zeros);
vec pen_new_positive(p,fill::zeros);
vec pen_new_negative(p,fill::zeros);
double pen_old = 0;
Function timesTwo( "timesTwo" );
double cs=0;
//Col dc(BETA.begin(),p,1);
//pen_old = as<double>(timesTwo(dc,group,k,unique_group));
vec z(p,fill::zeros);
vec w(p,fill::zeros);
int i = 0;
vec XtXi_beta_old = X_MAT*BETA;
loss_new_3= (Y - ited / (ited + exp(-XtXi_beta_old)));
double loss_old_one = accu(loss_new_3%loss_new_3);
//#pragma omp parallel private(i) num_threads(4)
//{
//#pragma omp for ordered schedule(static,1)
for(i= 0; i<p; i++){
//#pragma omp ordered
//{
iter(i) = step_size;
BETA_NEW.col(2*i) = BETA + iter;
BETA_NEW.col(2*i+1) = BETA - iter;
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update, you can source this file and I
//believe there is no other error
// best!
pen_new_positive(i) = as<double>(timesTwo(BETA_NEW.col(2*i),group,k,unique_group));
cs = pen_new_positive(i);
//Rcout << "cs" << cs << std::endl;
Rcout << "cs" << z << std::endl;
//pen_new_negative = as< std::vector<double> >(time(w,group,k,unique_group));
Diff_Loss(2*i) = new_loss(2*i) - loss_old_one + cs;
Diff_Loss(2*i+1) = new_loss(2*i+1) - loss_old_one + cs;
iter(i) = 0;
}
//}
//}
index = Diff_Loss.index_min();
beta_final = BETA_NEW.col(index);
return List::create( _["index"] = wrap(index),
_["Diff_Loss"]= wrap(Diff_Loss[index]),
_["ste"] =wrap(Diff_Loss),
_["beta_new"] = wrap(beta_final),
_["New_LOSS"]= wrap(new_loss[index]),
_["t"] = wrap(pen_new_positive));
}
I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows
NumericMatrix calcPWD1 (NumericMatrix x){
int outrows = x.nrow();
double d;
NumericMatrix out(outrows,outrows);
for (int i = 0 ; i < outrows - 1; i++){
for (int j = i + 1 ; j < outrows ; j ++){
NumericVector v1= x.row(i);
NumericVector v2= x.row(j);
NumericVector v3=v1-v2;
d = sqrt(sum(pow(v3,2)));
out(j,i)=d;
out(i,j)=d;
}
}
return (out) ;
}
But I find my program is slower than dist function.
> benchmark(as.matrix(dist(b)),calcPWD1(b))
test replications elapsed relative user.self sys.self user.child sys.child
1 as.matrix(dist(b)) 100 24.831 1.000 24.679 0.010 0 0
2 calcPWD1(b) 100 27.362 1.102 27.346 0.007 0 0
Do you guys have any suggestion? My matrix is very simple. There is no column names or row names, just plain matrix (for example like b=matrix(c(rnorm(1000*10)),1000,10)).
Here is the program of dist
> dist
function (x, method = "euclidean", diag = FALSE, upper = FALSE,
p = 2)
{
if (!is.na(pmatch(method, "euclidian")))
method <- "euclidean"
METHODS <- c("euclidean", "maximum", "manhattan", "canberra",
"binary", "minkowski")
method <- pmatch(method, METHODS)
if (is.na(method))
stop("invalid distance method")
if (method == -1)
stop("ambiguous distance method")
x <- as.matrix(x)
N <- nrow(x)
attrs <- if (method == 6L)
list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], p = p, call = match.call(),
class = "dist")
else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], call = match.call(),
class = "dist")
.Call(C_Cdist, x, method, attrs, p)
}
<bytecode: 0x56b0d40>
<environment: namespace:stats>
I expect my program is faster than dist since in dist, there are too many thing to need to be checked (like method, diag).
Rcpp vs. Internal R Functions (C/Fortran)
First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.
Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.
Base Implementation of dist()
Secondly, the distance calculation R uses is done in C as indicated by:
.Call(C_Cdist, x, method, attrs, p)
, which is the last line of the dist() function's R source. This gives it a slight advantage vs. C++ as it more granular instead of templated.
Furthermore, the C implementation uses OpenMP when available to parallelize the computation.
Proposed modification
Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
unsigned int outrows = x.nrow(), i = 0, j = 0;
double d;
Rcpp::NumericMatrix out(outrows,outrows);
for (i = 0; i < outrows - 1; i++){
Rcpp::NumericVector v1 = x.row(i);
for (j = i + 1; j < outrows ; j ++){
d = sqrt(sum(pow(v1-x.row(j), 2.0)));
out(j,i)=d;
out(i,j)=d;
}
}
return out;
}
You were almost there. But your inner loop body tried to do too much in one line. Template programming is hard enough as it is, and sometimes it is just better to spread instructions out a little to give the compiler a better chance. So I just made it five statements, and built immediatelt.
New code:
#include <Rcpp.h>
using namespace Rcpp;
double dist1 (NumericVector x, NumericVector y){
int n = y.length();
double total = 0;
for (int i = 0; i < n ; ++i) {
total += pow(x(i)-y(i),2.0);
}
total = sqrt(total);
return total;
}
// [[Rcpp::export]]
NumericMatrix calcPWD (NumericMatrix x){
int outrows = x.nrow();
int outcols = x.nrow();
NumericMatrix out(outrows,outcols);
for (int i = 0 ; i < outrows - 1; i++){
for (int j = i + 1 ; j < outcols ; j ++) {
NumericVector v1 = x.row(i);
NumericVector v2 = x.row(j-1);
double d = dist1(v1, v2);
out(j-1,i) = d;
out(i,j-1)= d;
}
}
return (out) ;
}
/*** R
M <- matrix(log(1:9), 3, 3)
calcPWD(M)
*/
Running it:
R> sourceCpp("/tmp/mikebrown.cpp")
R> M <- matrix(log(1:9), 3, 3)
R> calcPWD(M)
[,1] [,2] [,3]
[1,] 0.000000 0.740322 0
[2,] 0.740322 0.000000 0
[3,] 0.000000 0.000000 0
R>
You may want to check your indexing logic though. Looks like you missed more comparisons.
Edit: For kicks, here is a more compact version of your distance function:
// [[Rcpp::export]]
double dist2(NumericVector x, NumericVector y){
double d = sqrt( sum( pow(x - y, 2) ) );
return d;
}
I wrote a (probably-inefficient, but anyway..) Rcpp code using inline to simulate a stochastic SEIR model.
The serial version compiles and works perfectly, but since I need to simulate from it a large number of times and since it seems to me like an embarrassingly parallel problem (just need to simulate again for other parameter values and return a matrix with the results) I tried to add #pragma omp parallel for and to compile with -fopenmp -lgomp but ... boom!
I get a segfault even for very small examples!
I tried to add setenv("OMP_STACKSIZE","24M",1); and values well over 24M but still the segfault happens.
I'll explain briefly the code since it's a bit long (I tried to shorten it but the result change and I can't reproduce it..):
I have two nested loops, the inner one execute the model for a given parameter set and the outer one changes the parameters.
The only reason a race condition might happen is if the code were trying to execute set of instructions inside inner the loop in parallel (which cannot be done because of the model structure, on iteration t it depends on iteration t-1) and not to parallelize the outer, but if I'm not mistaken that is what the parallel for constructor does for default if put just outside the outer...
This is basically the form of the code I'm trying to run:
mat result(n_param,T_MAX);
#pragma omp parallel for
for(int i=0,i<n_param_set;i++){
t=0;
rowvec jnk(T_MAX);
while(t < T_MAX){
...
jnk(t) = something(jnk(t-1));
...
t++;
}
result.row(i)=jnk;
}
return wrap(result);
And my question is: How I tell the compiler that I just want to compute in parallel the outer loop (even distributing them statically like n_loops/n_threads for each thread) and not the inner one (which is actually non-parallelizable)?
The real code is a bit more involved and I'll present it here for the sake of reproducibility if you're really willing, but I'm only asking about the behavior of OpenMP. Please notice that the only OpenMP instruction appears at line 122.
library(Rcpp);library(RcppArmadillo);library(inline)
misc='
#include <math.h>
#define _USE_MATH_DEFINES
#include <omp.h>
using namespace arma;
template <typename T> int sgn(T val) {
return (T(0) < val) - (val < T(0));
}
uvec rmultinomial(int n,vec prob)
{
int K = prob.n_elem;
uvec rN = zeros<uvec>(K);
double p_tot = sum(prob);
double pp;
for(int k = 0; k < K-1; k++) {
if(prob(k)>0) {
pp = prob[k] / p_tot;
rN(k) = ((pp < 1.) ? (rbinom(1,(double) n, pp))(0) : n);
n -= rN[k];
} else
rN[k] = 0;
if(n <= 0) /* we have all*/
return rN;
p_tot -= prob[k]; /* i.e. = sum(prob[(k+1):K]) */
}
rN[K-1] = n;
return rN;
}
'
model_and_summary='
mat SEIR_sim_plus_summaries()
{
vec alpha;
alpha << 0.002 << 0.0045;
vec beta;
beta << 0.01 << 0.01;
vec gamma;
gamma << 1.0/14.0 << 1.0/14.0;
vec sigma;
sigma << 1.0/(3.5) << 1.0/(3.5);
vec phi;
phi << 0.8 << 0.8;
int S_0 = 800;
int E_0 = 100;
int I_0 = 100;
int R_0 = 0;
int pop = 1000;
double tau = 0.01;
double t_0 = 0;
vec obs_time;
obs_time << 1 << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10 << 11 << 12 << 13 << 14 << 15 << 16 << 17 << 18 << 19 << 20 << 21 << 22 << 23 << 24;
const int n_obs = obs_time.n_elem;
const int n_part = alpha.n_elem;
mat stat(n_part,6);
//#pragma omp parallel for
for(int k=0;k<n_part;k++) {
ivec INC_i(n_obs);
ivec INC_o(n_obs);
// Event variables
double alpha_t;
int nX; //current number of people moving
vec rates(8);
uvec trans(4); // current transitions, e.g. from S to E,I,R,Universe
vec r(4); // rates e.g. from S to E, I, R, Univ.
/*********************** Initialize **********************/
int S_curr = S_0;
int S_prev = S_0;
int E_curr = E_0;
int E_prev = E_0;
int I_curr = I_0;
int I_prev = I_0;
int R_curr = R_0;
int R_prev = R_0;
int IncI_curr = 0;
int IncI_prev = 0;
int IncO_curr = 0;
int IncO_prev = 0;
double t_curr = t_0;
int t_idx =0;
while( t_idx < n_obs ) {
// next time preparation
t_curr += tau;
S_prev = S_curr;
E_prev = E_curr;
I_prev = I_curr;
R_prev = R_curr;
IncI_prev = IncI_curr;
IncO_prev = IncO_curr;
/*********************** description (rates) of the events **********************/
alpha_t = alpha(k)*(1+phi(k)*sin(2*M_PI*(t_curr+0)/52)); //real contact rate, time expressed in weeks
rates(0) = (alpha_t * ((double)I_curr / (double)pop ) * ((double)S_curr)); //e+1, s-1, r,i one s get infected (goes in E, not yey infectous)
rates(1) = (sigma(k) * E_curr); //e-1, i+1, r,s one exposed become infectous (goes in I) INCIDENCE!!
rates(2) = (gamma(k) * I_curr); //i-1, s,e, r+1 one i recover
rates(3) = (beta(k) * I_curr); //i-1, s, r,e one i dies
rates(4) = (beta(k) * R_curr); //i,e, s, r-1 one r dies
rates(5) = (beta(k) * E_curr); //e-1, s, r,i one e dies
rates(6) = (beta(k) * S_curr); //s-1 e, i ,r one s dies
rates(7) = (beta(k) * pop); //s+1 one susc is born
// Let the events occour
/*********************** S compartement **********************/
if((rates(0)+rates(6))>0){
nX = rbinom(1,S_prev,1-exp(-(rates(0)+rates(6))*tau))(0);
r(0) = rates(0)/(rates(0)+rates(6)); r(1) = 0.0; r(2) = 0; r(3) = rates(6)/(rates(0)+rates(6));
trans = rmultinomial(nX, r);
S_curr -= nX;
E_curr += trans(0);
I_curr += trans(1);
R_curr += trans(2);
//trans(3) contains dead individual, who disappear...we could avoid this using sequential conditional binomial
}
/*********************** E compartement **********************/
if((rates(1)+rates(5))>0){
nX = rbinom(1,E_prev,1-exp(-(rates(1)+rates(5))*tau))(0);
r(0) = 0.0; r(1) = rates(1)/(rates(1)+rates(5)); r(2) = 0.0; r(3) = rates(5)/(rates(1)+rates(5));
trans = rmultinomial(nX, r);
S_curr += trans(0);
E_curr -= nX;
I_curr += trans(1);
R_curr += trans(2);
IncI_curr += trans(1);
}
/*********************** I compartement **********************/
if((rates(2)+rates(3))>0){
nX = rbinom(1,I_prev,1-exp(-(rates(2)+rates(3))*tau))(0);
r(0) = 0.0; r(1) = 0.0; r(2) = rates(2)/(rates(2)+rates(3)); r(3) = rates(3)/(rates(2)+rates(3));
trans = rmultinomial(nX, r);
S_curr += trans(0);
E_curr += trans(1);
I_curr -= nX;
R_curr += trans(2);
IncO_curr += trans(2);
}
/*********************** R compartement **********************/
if(rates(4)>0){
nX = rbinom(1,R_prev,1-exp(-rates(4)*tau))(0);
r(0) = 0.0; r(1) = 0.0; r(2) = 0.0; r(3) = rates(4)/rates(4);
trans = rmultinomial(nX, r);
S_curr += trans(0);
E_curr += trans(1);
I_curr += trans(2);
R_curr -= nX;
}
/*********************** Universe **********************/
S_curr += pop - (S_curr+E_curr+I_curr+R_curr); //it should be poisson, but since the pop is fixed...
/*********************** Save & Continue **********************/
// Check if the time is interesting for us
if(t_curr > obs_time[t_idx]){
INC_i(t_idx) = IncI_curr;
INC_o(t_idx) = IncO_curr;
IncI_curr = IncI_prev = 0;
IncO_curr = IncO_prev = 0;
t_idx++;
}
//else just go on...
}
/*********************** Finished - Starting w/ stats **********************/
// INC_i is the useful variable, how can I change its reference withour copying it?
ivec incidence = INC_i; //just so if I want to use INC_o i have to change just this...
//Scan the epidemics to recover the summary stats (naively divide the data each 52 weeks)
double n_years = ceil((double)obs_time(n_obs-1)/52.0);
vec mu_attack(n_years);
vec ratio_attack(n_years-1);
vec peak(n_years);
vec atk(52);
peak(0)=0.0;
vec tmpExplo(52); //explosiveness
vec explo(n_years);
int year=0;
int week;
for(week=0 ; week<n_obs ; week++){
if(week - 52*year > 51){
mu_attack(year) = sum( atk )/(double)pop;
if(year>0)
ratio_attack(year-1) = mu_attack(year)/mu_attack(year-1);
for(int i=0;i<52;i++){
if(atk(i)>(peak(year)/2.0)){
tmpExplo(i) = 1.0;
} else {
tmpExplo(i) = 0.0;
}
}
explo(year) = sum(tmpExplo);
year++;
peak(year)=0.0;
}
atk(week-52*year) = incidence(week);
if( peak(year) < incidence(week) )
peak(year)=incidence(week);
}
if(week - 52*year > 51){
mu_attack(year) = sum( atk )/(double)pop;
} else {
ivec idx(52);
for(int i=0;i<52;i++)
{ idx(i) = i; } //take just the updated ones...
vec tmp = atk.elem(find(idx<(week - 52*year)));
mu_attack(year) = sum( tmp )/((double)pop * (tmp.n_elem/52.0));
ratio_attack(year-1) = mu_attack(year)/mu_attack(year-1);
for(int i=0;i<tmp.n_elem;i++){
if(tmp(i)>(peak(year)/2.0)){
tmpExplo(i) = 1.0;
} else {
tmpExplo(i) = 0.0;
}
}
for(int i=tmp.n_elem;i<52;i++)
tmpExplo(i) = 0.0; //to reset the others
explo(year) = sum(tmpExplo);
}
double correlation2;
double correlation4;
vec autocorr = acf(peak);
/***** ACF *****/
if(n_years<3){
correlation2=0.0;
correlation4=0.0;
} else {
if(n_years<5){
correlation2 = autocorr(1);
correlation4 = 0.0;
} else {
correlation2 = autocorr(1);
correlation4 = autocorr(3);
}
}
rowvec jnk(6);
jnk << sum(mu_attack)/(year+1.0)
<< (sum( log(ratio_attack)%log(ratio_attack) )/(n_years-1)) - (pow(sum( log(ratio_attack) )/(n_years-1),2))
<< correlation2 << correlation4 << max(peak) << sum(explo)/n_years;
stat.row(k) = jnk;
}
return stat;
}
'
main='
std::cout << "max_num_threads " << omp_get_max_threads() << std::endl;
RNGScope scope;
mat summaries = SEIR_sim_plus_summaries();
return wrap(summaries);
'
plug = getPlugin("RcppArmadillo")
## modify the plugin for Rcpp to support OpenMP
plug$env$PKG_CXXFLAGS <- paste('-fopenmp', plug$env$PKG_CXXFLAGS)
plug$env$PKG_LIBS <- paste('-fopenmp -lgomp', plug$env$PKG_LIBS)
SEIR_sim_summary = cxxfunction(sig=signature(),main,settings=plug,inc = paste(misc,model_and_summary),verbose=TRUE)
SEIR_sim_summary()
Thanks for the help!
NB: before you ask, I slightly modified the Rcpp multinomial sampling function just because I liked that way more than the one using pointer...not any other particular reason! :)
The core pseudo-random number generators (PRNGs) in R are not designed to be used in multithreaded environments. That is, their state is stored in a static array (dummy from src/main/PRNG.c) and therefore is shared among all threads. Moreover several other static structures are used to store states for the higher-level interfaces to the core PRNGs.
A possible solution could be that you put each call to rnorm() or other sampling functions inside named critical sections with all having the same name, e.g.:
...
#pragma omp critical(random)
rN(k) = ((pp < 1.) ? (rbinom(1,(double) n, pp))(0) : n);
...
if((rates(0)+rates(6))>0){
#pragma omp critical(random)
nX = rbinom(1,S_prev,1-exp(-(rates(0)+rates(6))*tau))(0);
...
Note that the critical construct operates on the structured block following it and therefore locks the entire statement. If a random number is being drawn inline inside a call to a time-consuming function, e.g.
#pragma omp critical(random)
x = slow_computation(rbinom(...));
this is better transformed to:
#pragma omp critical(random)
rb = rbinom(...);
x = slow_computation(rb);
That way only the rb = rbinom(...); statement will be protected.