Rcpp loop update variable inside - c++

I am a new user of Rcpp and I am writing an package.
I have defined two functions in one script and try to call one from another in the loop.
One of my function defined as below:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
double timesTwo(colvec x, NumericVector group, double k,
NumericVector unique_group)
{
vec beta(x.begin(),x.size(),false);
vec Group(group.begin(),group.size(),false);
vec unigroup(unique_group.begin(),unique_group.size(),false);
beta = pow(beta,k);
int g = unigroup.size();
int j = 0;
uvec st ;
double b=0;
for(j = 0; j < g; j++)
{
st = find(Group == unigroup[j]);
b = b + abs(pow(sum(beta.elem(st)),1/k));
}
double s = b;
return s;
}
And the loop I use to to call this function is like below:
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update
// best!
pen_new_positive(i) = as<double>(timesTwo(z,group,k,unique_group));
My question is just like the comment I said in the loop, since I want to update that pen_new_postive(i) based on the BETA_NEW.col(2*i) However when I directly put BETA_NEW.COL(2*i) inside the timesTwo function, no matter how I change input
type of function (colvec or mat or whatever) there is error like below:
cannot convert "const::arma::subview_col<double> to "SEXP" in
initialization"
However when I directly use z in the timesTwo function, there is no update for my z in the loop.
Anyone could give me a hint about how to deal with this?
The full version of my code in second block as below:
#include <RcppArmadillo.h>
#include <math.h>
//#include <omp.h>
using namespace Rcpp;
using namespace arma;
//// [[Rcpp::plugins(openmp)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List minghan21041(NumericMatrix beta_new, NumericVector diff_loss,
NumericVector beta, double step_size, NumericVector y,
double k,
NumericVector group,
NumericVector unique_group,
NumericMatrix X) {
int n = X.nrow(), p=X.ncol() , n1=beta_new.nrow(), p1=beta_new.ncol();
mat X_MAT(X.begin(),n,p,false), BETA_NEW(beta_new.begin(),n1,p1,false);
vec BETA(beta.begin(),beta.size(),false);
vec Y(y.begin(),y.size(),false),
Diff_Loss(diff_loss.begin(),diff_loss.size(),false), iter(p,fill::zeros);
mat XtXi_beta_plus(n,p,fill::zeros);
mat XtXi_beta_minus(n,p,fill::zeros);
vec ited(n,fill::ones);
mat loss_new_1(n,p, fill::zeros),loss_new_2(n,p, fill::zeros);
colvec loss_new_3(n,fill::zeros);
vec new_loss(p1,fill::zeros);
uword index;
vec beta_final(p,fill::zeros);
vec pen_new_positive(p,fill::zeros);
vec pen_new_negative(p,fill::zeros);
double pen_old = 0;
Function timesTwo( "timesTwo" );
double cs=0;
//Col dc(BETA.begin(),p,1);
//pen_old = as<double>(timesTwo(dc,group,k,unique_group));
vec z(p,fill::zeros);
vec w(p,fill::zeros);
int i = 0;
vec XtXi_beta_old = X_MAT*BETA;
loss_new_3= (Y - ited / (ited + exp(-XtXi_beta_old)));
double loss_old_one = accu(loss_new_3%loss_new_3);
//#pragma omp parallel private(i) num_threads(4)
//{
//#pragma omp for ordered schedule(static,1)
for(i= 0; i<p; i++){
//#pragma omp ordered
//{
iter(i) = step_size;
BETA_NEW.col(2*i) = BETA + iter;
BETA_NEW.col(2*i+1) = BETA - iter;
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update, you can source this file and I
//believe there is no other error
// best!
pen_new_positive(i) = as<double>(timesTwo(BETA_NEW.col(2*i),group,k,unique_group));
cs = pen_new_positive(i);
//Rcout << "cs" << cs << std::endl;
Rcout << "cs" << z << std::endl;
//pen_new_negative = as< std::vector<double> >(time(w,group,k,unique_group));
Diff_Loss(2*i) = new_loss(2*i) - loss_old_one + cs;
Diff_Loss(2*i+1) = new_loss(2*i+1) - loss_old_one + cs;
iter(i) = 0;
}
//}
//}
index = Diff_Loss.index_min();
beta_final = BETA_NEW.col(index);
return List::create( _["index"] = wrap(index),
_["Diff_Loss"]= wrap(Diff_Loss[index]),
_["ste"] =wrap(Diff_Loss),
_["beta_new"] = wrap(beta_final),
_["New_LOSS"]= wrap(new_loss[index]),
_["t"] = wrap(pen_new_positive));
}

Related

Rcpp function returning different values occasionally when run multiple times

I am currently implementing Clenshaw's algorithm with Rcpp to speed up my previous implementation in R. My current implementation is as follows (note that I am using RcppParallel for other functions defined in the same source file; RcppParalell is not used in this specific function, but I've left the headers in case this is somehow relevant):
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::plugins("cpp11")]]
// [[Rcpp::export]]
NumericVector clenshawAllDerivatives(double t, int N, double Ta, double Tb, NumericVector Coeffs, int derivativesOrder) {
double tau = (2*t-Ta-Tb)/(Tb-Ta);
double helperValues[derivativesOrder + 1][3];
double scale;
for(double i = N; i > 1; i--) {
helperValues[0][2] = helperValues[0][1];
helperValues[0][1] = helperValues[0][0];
helperValues[0][0] = 2*tau*helperValues[0][1]-helperValues[0][2] + Coeffs[i - 1];
scale=2.0;
for(int j = 1; j <= derivativesOrder; j++) {
helperValues[j][2] = helperValues[j][1];
helperValues[j][1] = helperValues[j][0];
helperValues[j][0] = scale*helperValues[j-1][1] + 2*tau*helperValues[j][1] - helperValues[j][2];
scale += 2.0;
}
}
NumericVector output(derivativesOrder + 1);
output[0] = tau*helperValues[0][0] - helperValues[0][1] + Coeffs[0];
scale = 1.0;
double scale2initial = ((Tb-Ta)/2 * 86400.0), scale2 = scale2initial;
for(int j = 1; j <= derivativesOrder; j++) {
output[j] = (scale*helperValues[j-1][0] + tau*helperValues[j][0] - helperValues[j][1]) / scale2;
scale += 1.0;
scale2 = scale2 * scale2initial;
}
return output;
}
An example of application of the function, with example input values:
clenshawAllDerivatives(59568.5, 11, 59568, 59584, c(-1.281626e+06, -4.069960e+03, 2.725817e+01, -9.715712e-02, -1.115373e-03, -5.121949e-04, -9.068147e-05, -6.829206e-06, 1.657523e-07 , 1.406006e-07, 2.273966e-08), 1)
When run multiple times, this returns most often the expected correct output of c(-1.277790e+06, -6.037188e-03). However, sometimes it returns instead wrong values, typically very high numbers.
Any help to identify the cause of this unexpected behavior would be greatly appreciated!

How to speed-up Log-sum-exp function over multidimensional R Arrays?

I am working on speeding up a program I wrote in R. The code involves repeatedly computing LogSumExp over multidimensional arrays, i.e computing s_lnj = exp(u_lnj) / (1 + sum_k exp(u_lnk)). The base R version of the code I am trying to increase the speed of is the following:
log_sum_exp_func <- function(vec){
max_vec <- max(vec)
return(max_vec + log(sum(exp(vec-max_vec))))
}
compute_share_from_utils_func <- function(u_lnj){
### get dimensions
L <- dim(u_lnj)[1]; n_poly <- dim(u_lnj)[2]; J <- dim(u_lnj)[3]
### compute denominator of share, 1 + sum exp utils
den_ln <- 1 + exp(apply(u_lnj, c(1,2), log_sum_exp_func))
den_lnj <- array(rep(den_ln, J), dim = c(L, n_poly, J))
### take ratio of utils and denominator
s_lnj <- exp(u_lnj) / den_lnj
return(s_lnj)
}
I tried to use xtensor and Rcpp to speed things up, but ran into several issues. The Rcpp code I wrote is the following
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::plugins(cpp14)]]
#include <numeric> // Standard library import for std::accumulate
#define STRICT_R_HEADERS // Otherwise a PI macro is defined in R
#include "xtensor/xmath.hpp" // xtensor import for the C++ universal functions
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xview.hpp"
#include "xtensor-r/rarray.hpp" // R bindings
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double cxxlog_sum_exp_vec(xt::rarray<double>& m)
{
auto shape_m = m.shape();
double maxvec = xt::amax(m)[0];
xt::rarray<double> arr_maxvec = maxvec * xt::ones<double>(shape_m);
xt::rarray<double> vec_min_max = m - arr_maxvec;
xt::rarray<double> exp_vec_min_max = xt::exp(vec_min_max);
double sum_exp = xt::sum(exp_vec_min_max)[0];
double log_sum_exp = std::log(sum_exp);
return log_sum_exp + maxvec;
}
// [[Rcpp::export]]
xt::rarray<double> cxxshare_from_utils(xt::rarray<double>& u_lnj)
{
int L = u_lnj.shape(0);
int N = u_lnj.shape(1);
int J = u_lnj.shape(2);
xt::rarray<double> res = xt::ones<double>({L,N,J});
for (std::size_t l = 0; l < u_lnj.shape()[0]; ++l)
{
for (std::size_t n = 0; n < u_lnj.shape()[1]; ++n)
{
xt::rarray<double> utils_j = xt::view(u_lnj, l, n, xt::all());
double inv_lse = 1 / (1 + std::exp(cxxlog_sum_exp_vec(utils_j)));
for (std::size_t j = 0; j < J; ++j)
{
res(l, n, j) = std::exp(u_lnj(l, n, j)) * inv_lse;
}
}
}
return res;
}
The Rcpp implementation does seem to yield the same results as the base R code, however it seems to encounter problems whenever the dimensions of the input array increase. My R Session fails if I run
L <- 100
n <- 100
J <- 200
u_lnj <- array(rnorm(L*n*J,0,2), dim = c(L, n, J))
test <- cxxshare_from_utils(u_lnj)
But the code runs fine for L, n, J = 10,10,20 for instance. Moreover, the C++ implementation of log_sum_exp does not seem to outperform the base R version that much.
EDIT: I could not figure out what was the issue with the way I am using xtensor. But I did get some speed up with the following RcppArmadillo code. The drawback of this version is that is likely not as robust to overflow as the base R function relying on Log Sum Exp.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::export]]
arma::cube cxxarma_share_from_utils(arma::cube u_lnj) {
// Extract the different dimensions
// Normal Matrix dimensions
unsigned int L = u_lnj.n_rows;
unsigned int N = u_lnj.n_cols;
// Depth of Array
unsigned int J = u_lnj.n_slices;
//resulting cube
arma::cube s_lnj = arma::exp(u_lnj);
for (unsigned int l = 0; l < L; l++) {
for (unsigned int n = 0; n < N; n++) {
double den = 1 / (1 + arma::accu(s_lnj.subcube(arma::span(l), arma::span(n), arma::span())));
for (unsigned int j = 0; j < J; j++) {
s_lnj(l, n, j) = s_lnj(l, n, j) * den;
}
}
}
return s_lnj;
}

define and filling a sparse matrix using Eigen Library in C++

I am trying to build a spars Matrix using a Eigen or Armadillo library in C++ to solve a system of linear equations Ax=b. A is the coefficient matrix with a dimension of n*n, and B is a vector of right hand side with a dimension of n
the Spars Matrix A is like this, see the figure
I had a look though the Eigen document but I have a problem with defining and filling the Spars Matrix in C++.
could you please give me an example code to define the spars matrix and how to fill the values into the matrix using Eigen library in c++?
consider for example a simple spars matrix A:
1 2 0 0
0 3 0 0
0 0 4 5
0 0 6 7
int main()
{
SparseMatrix<double> A;
// fill the A matrix ????
VectorXd b, x;
SparseCholesky<SparseMatrix<double> > solver;
solver.compute(A);
x = solver.solve(b);
return 0;
}
The sparse matrix could be filled with the values mentioned in the post by using the .coeffRef() member function, as shown in this routine:
SparseMatrix<double> fillMatrix() {
int N = 4;
int M = 4;
SparseMatrix<double> m1(N,M);
m1.reserve(VectorXi::Constant(M, 4)); // 4: estimated number of non-zero enties per column
m1.coeffRef(0,0) = 1;
m1.coeffRef(0,1) = 2.;
m1.coeffRef(1,1) = 3.;
m1.coeffRef(2,2) = 4.;
m1.coeffRef(2,3) = 5.;
m1.coeffRef(3,2) = 6.;
m1.coeffRef(3,3) = 7.;
m1.makeCompressed();
return m1;
}
However, the SparseCholesky module (SimplicialCholesky<SparseMatrix<double> >) won't work in this case because the matrix is not Hermitian. The system could be solved with a LU or BiCGStab solver. Also note that sizes ofx and b need to be defined:
VectorXd b(A.rows()), x(A.cols());
In case of larger sparse matrices you may also want to look at the .reserve() function in order to allocate memory before filling the elements. The .reserve() function can be used to provide an estimate of the number of non-zero entries per column (or row, depending on the storage order. The default is comumn-major). In the example above that estimate is 4, but it does not make sense in such a small matrix. The documentation states that it is preferable to overestimate the number of non-zeros per column.
Since this question also asks about Armadillo, here is the corresponding Armadillo-based code. Best to use Armadillo version 9.100+ or later, and link with SuperLU.
#include <armadillo>
using namespace arma;
int main()
{
sp_mat A(4,4); // don't need to explicitly reserve the number of non-zeros
// fill with direct element access
A(0,0) = 1.0;
A(0,1) = 2.0;
A(1,1) = 3.0;
A(2,2) = 4.0;
A(2,3) = 5.0;
A(3,2) = 6.0;
A(3,3) = 7.0; // etc
// or load the sparse matrix from a text file with the data stored in coord format
sp_mat AA;
AA.load("my_sparse_matrix.txt", coord_ascii)
vec b; // ... fill b here ...
vec x = spsolve(A,b); // solve sparse system
return 0;
}
See also the documentation for SpMat, element access, .load(), spsolve().
The coord file format is simple. It stores non-zeros values.
Each line contains:
row col value
The row and column counts start at zero. Example:
0 0 1.0
0 1 2.0
1 1 3.0
2 2 4.0
2 3 5.0
3 2 6.0
3 3 7.0
1000 2000 9.0
Values not explicitly listed are assumed to be zero.
#include <vector>
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Sparse>
#include <Eigen/Core>
#include <cstdlib>
using namespace Eigen;
using namespace std;
int main()
{
double L = 5; // Length
const int N = 120; // No of cells
double L_cell = L / N;
double k = 100; // Thermal Conductivity
double T_A = 100.;
double T_B = 200.;
double S = 1000.;
Vector<double, N> d, D, A, aL, aR, aP, S_u, S_p;
vector<double> xp;
xp.push_back((0 + L_cell) / 2.0);
double xm = xp[0];
for (int i = 0; i < N - 1; i++)
{
xm = xm + L_cell;
xp.push_back(xm);
}
for (int i = 0; i < N; i++)
{
A(i) = .1;
d(i) = L_cell;
D(i) = k / d(i);
}
aL(0) = 0;
aR(0) = D(0) * A(0);
S_p(0) = -2 * D(0) * A(0);
aP(0) = aL(0) + aR(0) - S_p(0);
S_u(0) = 2 * D(0) * A(0) * T_A + S * L_cell * A(0);
for (int i = 1; i < N - 1; i++)
{
aL(i) = D(i) * A(i);
aR(i) = D(i) * A(i);
S_p(i) = 0;
aP(i) = aL(i) + aR(i) - S_p(i);
S_u(i) = S * A(i) * L_cell;
}
aL(N - 1) = D(N - 1) * A(N - 1);
aR(N - 1) = 0;
S_p(N - 1) = -2 * D(N - 1) * A(N - 1);
aP(N - 1) = aL(N - 1) + aR(N - 1) - S_p(N - 1);
S_u(N - 1) = 2 * D(N - 1) * A(N - 1) * T_B + S * L_cell * A(N - 1);
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(N * 3);
Matrix<double, N, 3> v; // v is declared here
v << (-1) * aL, aP, (-1) * aR;
for (int i = 0, j = 0; i < N && j < N; i++, j++)
{
tripletList.push_back(T(i, j, v(i, 1)));
if (i + 1 < N && j + 1 < N)
{
tripletList.push_back(T(i + 1, j, v(i + 1, 0)));
tripletList.push_back(T(i, j + 1, v(i, 2)));
}
}
SparseMatrix<double> coeff(N, N);
coeff.setFromTriplets(tripletList.begin(), tripletList.end());
SimplicialLDLT<SparseMatrix<double> > solver;
solver.compute(coeff);
if (solver.info() != Success) {
cout << "decomposition failed" << endl;
return;
}
Vector<double, N> temparature;
temparature = solver.solve(S_u);
if (solver.info() != Success)
{
cout << "Solving failed" << endl;
return;
}
vector<double> Te = {}, x = {};
Te.push_back(T_A);
x.push_back(0);
for (int i = 0; i < N; i++)
{
Te.push_back(temparature(i));
x.push_back(xp[i]);
}
Te.push_back(T_B);
x.push_back(L);
for (int i = 0; i < N + 2; i++)
{
cout << x[i] << " " << Te[i] << endl;
}
return 0;
}
Here is a full code of a solution to numerical problem which uses SparseMatrix. Look at the matrix v. It has the values of all the nonzero elements of coeff matrix yet to be defined. In the next loop I made a series of tripletList.push_back(...) adding a triplet consisting of row and column index and corresponding value taken from v for each non-zero element of coeff. Now declare a Sparse Matrix coeff with appropriate size and use the method setFromTriplets (documentation) to set its non-zero elements from tripletList triplets.

RcppParallel Parallelizing distance computation: segfault

I have a matrix, for which I want to compute the distance (let's say Euclidean) between the ith row and every other row(i.e. I want the ith row of the pairwise distance matrix).
#include <Rcpp.h>
#include <cmath>
#include <algorithm>
#include <RcppParallel.h>
//#include <RcppArmadillo.h>
#include <queue>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::export]]
double dist_fun(NumericVector row1, NumericVector row2){
double rval = 0;
for (int i = 0; i < row1.length(); i++){
rval += (row1[i] - row2[i]) * (row1[i] - row2[i]);
}
return rval;
}
// [[Rcpp::export]]
NumericVector dist_row(NumericMatrix mat, int i){
NumericVector row(mat.nrow());
NumericMatrix::Row row1 = mat.row(i - 1);
for (int j = 0; j < mat.nrow(); j++){
NumericMatrix::Row row2 = mat.row(j);
row(j) = dist_fun(row1, row2);
}
return row;
}
// [[Rcpp::depends(RcppParallel)]]
struct JsDistance: public Worker {
// input matrix to read from
const NumericMatrix mat;
int i;
// output vector to write to
NumericVector output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i), output(output) {}
// function call operator that work for the specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
NumericVector row1 = mat.row(i);
for (std::size_t j = begin; j < end; j++) {
NumericVector row2 = mat.row(j);
output[j] = dist_fun(row1, row2);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row(NumericMatrix mat, int i) {
// allocate the matrix we will return
NumericVector output(mat.nrow());
// create the worker
JsDistance JsDistance(mat, i, output);
// call it with parallelFor
parallelFor(0, mat.nrow(), JsDistance);
return output;
}
The sequential way using Rcpp is the function 'row_dist' as written above. Yet the matrix I want to work with is very large so I want to parallelize it. But then I will run into a segfault error which I don't quite understand why. To trigger the error you can run the following code:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 20)
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
sourceCpp("question.cpp")
start1 = proc.time()
print(dist_row(X, 2)[1:30])
print(proc.time() - start1)
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
print(proc.time() - start2)
Can someone give me some hint about what I did wrong? Thanks in advance for your time!
=======================================================================
Edit:
inline double d(double a, double b){
return fabs(a - b);
}
// [[Rcpp::depends(RcppParallel)]
struct dtwDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
dtwDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
size_t n = row1.length();
size_t m = row2.length();
NumericMatrix cost(n + 1, m + 1);
for (int ii = 1; ii <= n; ii++){
cost(i, 0) = numeric_limits<double>::infinity();
}
for (int jj = 1; jj <= m; jj++){
cost(0, j) = numeric_limits<double>::infinity();
}
for (int ii = 1; ii <= n; ii++){
for (int jj = 1; jj <= m; jj++){
double dist = d(row1[ii - 1], row2[jj - 1]);
cost(ii, jj) = dist + min(min(cost(ii - 1, jj), cost(ii, jj - 1)), cost(ii - 1, jj - 1));
//cout << ii << ", " << jj << ", " << cost(ii, jj) << "\n";
}
}
output[j] = cost(n, m);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row_dtw(NumericMatrix mat, int i) {
// allocate the matrix we will return
//RMatrix<double> input(mat);
NumericVector y(mat.nrow());
//RVector<double> output(y);
// create the worker
dtwDistance dtwDistance(mat, i, y);
// call it with parallelFor
parallelFor(0, mat.nrow(), dtwDistance);
return y;
}
The distance I needed to calculate is the dynamic time warping distance. I implemented it as above. Yet when running, it will give a 'stack imbalance' warning. And there will be a segfault after several runs. I'm wondering what is the problem now.
To trigger the problem, I did:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 4)
sourceCpp("scripts/chisq_dtw.cpp")
set.seed(42)
X = matrix(rnorm(1000), 100, 10)
parallel_dist_row_dtw(X, 1)
parallel_dist_row_dtw(X, 2)
parallel_dist_row_dtw(X, 3)
parallel_dist_row_dtw(X, 4)
parallel_dist_row_dtw(X, 5)
The issue is you are not using the thread-safe wrapper around R objects via RMatrix<T> and RVector<T>. These classes are important because of the parallelization being executed on a background thread, which is an area that is not safe to call R or Rcpp APIs. The official documentation emphasizes this in the Safe Accessors section.
In particular, we have:
To provide safe and convenient access to the arrays underlying R vectors and matrices RcppParallel introduces several accessor classes:
RVector<T> — Wrap R vectors of various types
RMatrix<T> — Wrap R matrices of various types (also includes Row and Column classes)
To create a thread safe accessor for an Rcpp vector or matrix just construct an instance of RVector or RMatrix with it.
Code Fix
So, your work can be fixed by switching *Matrix to RMatrix<T> and *Vector to RVector<T>.
struct JsDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
double rval = 0;
for (unsigned int k = 0; k < row1.length(); ++k) {
rval += (row1[k] - row2[k]) * (row1[k] - row2[k]);
}
output[j] = rval;
}
}
};
In particular, the data types used here are of the form RMatrix<double> even for accessing the matrix.
Also, within the parallelized version there is a missing i-1 statement. To remedy this, I've opted to have it taken care of in the constructor of JSDistance.
Test
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
start1 = proc.time()
print(dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start1)
# user system elapsed
# 0.22 0.00 0.23
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start2)
# user system elapsed
# 0.28 0.00 0.06
all.equal(parallel_dist_row(X, 2), dist_row(X, 2))
# [1] TRUE

Memory Overflow? std::badalloc

I have a program that solves generally for 1D brownian motion using an Euler's Method.
Being a stochastic process, I want to average it over many particles. But I find that as I ramp up the number of particles, it overloads and i get the std::badalloc error, which I understand is a memory error.
Here is my full code
#include <iostream>
#include <vector>
#include <fstream>
#include <cmath>
#include <cstdlib>
#include <limits>
#include <ctime>
using namespace std;
// Box-Muller Method to generate gaussian numbers
double generateGaussianNoise(double mu, double sigma) {
const double epsilon = std::numeric_limits<double>::min();
const double tau = 2.0 * 3.14159265358979323846;
static double z0, z1;
static bool generate;
generate = !generate;
if (!generate) return z1 * sigma + mu;
double u1, u2;
do {
u1 = rand() * (1.0 / RAND_MAX);
u2 = rand() * (1.0 / RAND_MAX);
} while (u1 <= epsilon);
z0 = sqrt(-2.0 * log(u1)) * cos(tau * u2);
z1 = sqrt(-2.0 * log(u1)) * sin(tau * u2);
return z0 * sigma + mu;
}
int main() {
// Initialize Variables
double gg; // Gaussian Number Picked from distribution
// Integrator
double t0 = 0; // Setting the Time Window
double tf = 10;
double n = 5000; // Number of Steps
double h = (tf - t0) / n; // Time Step Size
// Set Constants
const double pii = atan(1) * 4; // pi
const double eta = 1; // viscous constant
const double m = 1; // mass
const double aa = 1; // radius
const double Temp = 30; // Temperature in Kelvins
const double KB = 1; // Boltzmann Constant
const double alpha = (6 * pii * eta * aa);
// More Constants
const double mu = 0; // Gaussian Mean
const double sigma = 1; // Gaussian Std Deviation
const double ng = n; // No. of pts to generate for Gauss distribution
const double npart = 1000; // No. of Particles
// Initial Conditions
double x0 = 0;
double y0 = 0;
double t = t0;
// Vectors
vector<double> storX; // Vector that keeps displacement values
vector<double> storY; // Vector that keeps velocity values
vector<double> storT; // Vector to store time
vector<double> storeGaussian; // Vector to store Gaussian numbers generated
vector<double> holder; // Placeholder Vector for calculation operations
vector<double> mainstore; // Vector that holds the final value desired
storT.push_back(t0);
// Prepares mainstore
for (int z = 0; z < (n+1); z++) {
mainstore.push_back(0);
}
for (int NN = 0; NN < npart; NN++) {
holder.clear();
storX.clear();
storY.clear();
storT.clear();
storT.push_back(0);
// Prepares holder
for (int z = 0; z < (n+1); z++) {
holder.push_back(0);
storX.push_back(0);
storY.push_back(0);
}
// Gaussian Generator
srand(time(NULL));
for (double iiii = 0; iiii < ng; iiii++) {
gg = generateGaussianNoise(0, 1); // generateGaussianNoise(mu,sigma)
storeGaussian.push_back(gg);
}
// Solver
for (int ii = 0; ii < n; ii++) {
storY[ii + 1] =
storY[ii] - (alpha / m) * storY[ii] * h +
(sqrt(2 * alpha * KB * Temp) / m) * sqrt(h) * storeGaussian[ii];
storX[ii + 1] = storX[ii] + storY[ii] * h;
holder[ii + 1] =
pow(storX[ii + 1], 2); // Finds the displacement squared
t = t + h;
storT.push_back(t);
}
// Updates the Main Storage
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] + holder[z];
}
}
// Average over the number of particles
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] / (npart);
}
// Outputs the data
ofstream fout("LangevinEulerTest.txt");
for (int jj = 0; jj < storX.size(); jj++) {
fout << storT[jj] << '\t' << mainstore[jj] << '\t' << storX[jj] << endl;
}
return 0;
}
As you can see, npart is the variable that I change to vary the number of particles. But after each iteration, I do clear my storage vectors like storX,storY... So on paper, the number of particles should not affect memory? I am only just calling the compiler to repeat many more times, and add onto the main storage vector mainstore. I am running my code on a computer with 4GB ram.
Would greatly appreciate it if anyone could point out my errors in logic or suggest improvements.
Edit: Currently the number of particles is set to npart = 1000.
So when I try to ramp it up to like npart = 20000 or npart = 50000, it gives me memory errors.
Edit2 I've edited the code to allocate an extra index to each of the storage vectors. But it does not seem to fix the memory overflow
There is an out of bounds exception in the solver part. storY has size n and you access ii+1 where i goes up to n-1. So for your code provided. storY has size 5000. It is allowed to access with indices between 0 and 4999 (including) but you try to access with index 5000. The same for storX, holder and mainstore.
Also, storeGaussian does not get cleared before adding new variables. It grows by n for each npart loop. You access only the first n values of it in the solver part anyway.
Please note, that vector::clear removes all elements from the vector, but does not necessarily change the vector's capacity (i.e. it's storage array), see the documentation.
This won't cause the problem here, because you'll reuse the same array in the next runs, but it's something to be aware when using vectors.