How to speed-up Log-sum-exp function over multidimensional R Arrays? - c++

I am working on speeding up a program I wrote in R. The code involves repeatedly computing LogSumExp over multidimensional arrays, i.e computing s_lnj = exp(u_lnj) / (1 + sum_k exp(u_lnk)). The base R version of the code I am trying to increase the speed of is the following:
log_sum_exp_func <- function(vec){
max_vec <- max(vec)
return(max_vec + log(sum(exp(vec-max_vec))))
}
compute_share_from_utils_func <- function(u_lnj){
### get dimensions
L <- dim(u_lnj)[1]; n_poly <- dim(u_lnj)[2]; J <- dim(u_lnj)[3]
### compute denominator of share, 1 + sum exp utils
den_ln <- 1 + exp(apply(u_lnj, c(1,2), log_sum_exp_func))
den_lnj <- array(rep(den_ln, J), dim = c(L, n_poly, J))
### take ratio of utils and denominator
s_lnj <- exp(u_lnj) / den_lnj
return(s_lnj)
}
I tried to use xtensor and Rcpp to speed things up, but ran into several issues. The Rcpp code I wrote is the following
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::plugins(cpp14)]]
#include <numeric> // Standard library import for std::accumulate
#define STRICT_R_HEADERS // Otherwise a PI macro is defined in R
#include "xtensor/xmath.hpp" // xtensor import for the C++ universal functions
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xview.hpp"
#include "xtensor-r/rarray.hpp" // R bindings
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double cxxlog_sum_exp_vec(xt::rarray<double>& m)
{
auto shape_m = m.shape();
double maxvec = xt::amax(m)[0];
xt::rarray<double> arr_maxvec = maxvec * xt::ones<double>(shape_m);
xt::rarray<double> vec_min_max = m - arr_maxvec;
xt::rarray<double> exp_vec_min_max = xt::exp(vec_min_max);
double sum_exp = xt::sum(exp_vec_min_max)[0];
double log_sum_exp = std::log(sum_exp);
return log_sum_exp + maxvec;
}
// [[Rcpp::export]]
xt::rarray<double> cxxshare_from_utils(xt::rarray<double>& u_lnj)
{
int L = u_lnj.shape(0);
int N = u_lnj.shape(1);
int J = u_lnj.shape(2);
xt::rarray<double> res = xt::ones<double>({L,N,J});
for (std::size_t l = 0; l < u_lnj.shape()[0]; ++l)
{
for (std::size_t n = 0; n < u_lnj.shape()[1]; ++n)
{
xt::rarray<double> utils_j = xt::view(u_lnj, l, n, xt::all());
double inv_lse = 1 / (1 + std::exp(cxxlog_sum_exp_vec(utils_j)));
for (std::size_t j = 0; j < J; ++j)
{
res(l, n, j) = std::exp(u_lnj(l, n, j)) * inv_lse;
}
}
}
return res;
}
The Rcpp implementation does seem to yield the same results as the base R code, however it seems to encounter problems whenever the dimensions of the input array increase. My R Session fails if I run
L <- 100
n <- 100
J <- 200
u_lnj <- array(rnorm(L*n*J,0,2), dim = c(L, n, J))
test <- cxxshare_from_utils(u_lnj)
But the code runs fine for L, n, J = 10,10,20 for instance. Moreover, the C++ implementation of log_sum_exp does not seem to outperform the base R version that much.
EDIT: I could not figure out what was the issue with the way I am using xtensor. But I did get some speed up with the following RcppArmadillo code. The drawback of this version is that is likely not as robust to overflow as the base R function relying on Log Sum Exp.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::export]]
arma::cube cxxarma_share_from_utils(arma::cube u_lnj) {
// Extract the different dimensions
// Normal Matrix dimensions
unsigned int L = u_lnj.n_rows;
unsigned int N = u_lnj.n_cols;
// Depth of Array
unsigned int J = u_lnj.n_slices;
//resulting cube
arma::cube s_lnj = arma::exp(u_lnj);
for (unsigned int l = 0; l < L; l++) {
for (unsigned int n = 0; n < N; n++) {
double den = 1 / (1 + arma::accu(s_lnj.subcube(arma::span(l), arma::span(n), arma::span())));
for (unsigned int j = 0; j < J; j++) {
s_lnj(l, n, j) = s_lnj(l, n, j) * den;
}
}
}
return s_lnj;
}

Related

Rcpp function returning different values occasionally when run multiple times

I am currently implementing Clenshaw's algorithm with Rcpp to speed up my previous implementation in R. My current implementation is as follows (note that I am using RcppParallel for other functions defined in the same source file; RcppParalell is not used in this specific function, but I've left the headers in case this is somehow relevant):
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::plugins("cpp11")]]
// [[Rcpp::export]]
NumericVector clenshawAllDerivatives(double t, int N, double Ta, double Tb, NumericVector Coeffs, int derivativesOrder) {
double tau = (2*t-Ta-Tb)/(Tb-Ta);
double helperValues[derivativesOrder + 1][3];
double scale;
for(double i = N; i > 1; i--) {
helperValues[0][2] = helperValues[0][1];
helperValues[0][1] = helperValues[0][0];
helperValues[0][0] = 2*tau*helperValues[0][1]-helperValues[0][2] + Coeffs[i - 1];
scale=2.0;
for(int j = 1; j <= derivativesOrder; j++) {
helperValues[j][2] = helperValues[j][1];
helperValues[j][1] = helperValues[j][0];
helperValues[j][0] = scale*helperValues[j-1][1] + 2*tau*helperValues[j][1] - helperValues[j][2];
scale += 2.0;
}
}
NumericVector output(derivativesOrder + 1);
output[0] = tau*helperValues[0][0] - helperValues[0][1] + Coeffs[0];
scale = 1.0;
double scale2initial = ((Tb-Ta)/2 * 86400.0), scale2 = scale2initial;
for(int j = 1; j <= derivativesOrder; j++) {
output[j] = (scale*helperValues[j-1][0] + tau*helperValues[j][0] - helperValues[j][1]) / scale2;
scale += 1.0;
scale2 = scale2 * scale2initial;
}
return output;
}
An example of application of the function, with example input values:
clenshawAllDerivatives(59568.5, 11, 59568, 59584, c(-1.281626e+06, -4.069960e+03, 2.725817e+01, -9.715712e-02, -1.115373e-03, -5.121949e-04, -9.068147e-05, -6.829206e-06, 1.657523e-07 , 1.406006e-07, 2.273966e-08), 1)
When run multiple times, this returns most often the expected correct output of c(-1.277790e+06, -6.037188e-03). However, sometimes it returns instead wrong values, typically very high numbers.
Any help to identify the cause of this unexpected behavior would be greatly appreciated!

define and filling a sparse matrix using Eigen Library in C++

I am trying to build a spars Matrix using a Eigen or Armadillo library in C++ to solve a system of linear equations Ax=b. A is the coefficient matrix with a dimension of n*n, and B is a vector of right hand side with a dimension of n
the Spars Matrix A is like this, see the figure
I had a look though the Eigen document but I have a problem with defining and filling the Spars Matrix in C++.
could you please give me an example code to define the spars matrix and how to fill the values into the matrix using Eigen library in c++?
consider for example a simple spars matrix A:
1 2 0 0
0 3 0 0
0 0 4 5
0 0 6 7
int main()
{
SparseMatrix<double> A;
// fill the A matrix ????
VectorXd b, x;
SparseCholesky<SparseMatrix<double> > solver;
solver.compute(A);
x = solver.solve(b);
return 0;
}
The sparse matrix could be filled with the values mentioned in the post by using the .coeffRef() member function, as shown in this routine:
SparseMatrix<double> fillMatrix() {
int N = 4;
int M = 4;
SparseMatrix<double> m1(N,M);
m1.reserve(VectorXi::Constant(M, 4)); // 4: estimated number of non-zero enties per column
m1.coeffRef(0,0) = 1;
m1.coeffRef(0,1) = 2.;
m1.coeffRef(1,1) = 3.;
m1.coeffRef(2,2) = 4.;
m1.coeffRef(2,3) = 5.;
m1.coeffRef(3,2) = 6.;
m1.coeffRef(3,3) = 7.;
m1.makeCompressed();
return m1;
}
However, the SparseCholesky module (SimplicialCholesky<SparseMatrix<double> >) won't work in this case because the matrix is not Hermitian. The system could be solved with a LU or BiCGStab solver. Also note that sizes ofx and b need to be defined:
VectorXd b(A.rows()), x(A.cols());
In case of larger sparse matrices you may also want to look at the .reserve() function in order to allocate memory before filling the elements. The .reserve() function can be used to provide an estimate of the number of non-zero entries per column (or row, depending on the storage order. The default is comumn-major). In the example above that estimate is 4, but it does not make sense in such a small matrix. The documentation states that it is preferable to overestimate the number of non-zeros per column.
Since this question also asks about Armadillo, here is the corresponding Armadillo-based code. Best to use Armadillo version 9.100+ or later, and link with SuperLU.
#include <armadillo>
using namespace arma;
int main()
{
sp_mat A(4,4); // don't need to explicitly reserve the number of non-zeros
// fill with direct element access
A(0,0) = 1.0;
A(0,1) = 2.0;
A(1,1) = 3.0;
A(2,2) = 4.0;
A(2,3) = 5.0;
A(3,2) = 6.0;
A(3,3) = 7.0; // etc
// or load the sparse matrix from a text file with the data stored in coord format
sp_mat AA;
AA.load("my_sparse_matrix.txt", coord_ascii)
vec b; // ... fill b here ...
vec x = spsolve(A,b); // solve sparse system
return 0;
}
See also the documentation for SpMat, element access, .load(), spsolve().
The coord file format is simple. It stores non-zeros values.
Each line contains:
row col value
The row and column counts start at zero. Example:
0 0 1.0
0 1 2.0
1 1 3.0
2 2 4.0
2 3 5.0
3 2 6.0
3 3 7.0
1000 2000 9.0
Values not explicitly listed are assumed to be zero.
#include <vector>
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Sparse>
#include <Eigen/Core>
#include <cstdlib>
using namespace Eigen;
using namespace std;
int main()
{
double L = 5; // Length
const int N = 120; // No of cells
double L_cell = L / N;
double k = 100; // Thermal Conductivity
double T_A = 100.;
double T_B = 200.;
double S = 1000.;
Vector<double, N> d, D, A, aL, aR, aP, S_u, S_p;
vector<double> xp;
xp.push_back((0 + L_cell) / 2.0);
double xm = xp[0];
for (int i = 0; i < N - 1; i++)
{
xm = xm + L_cell;
xp.push_back(xm);
}
for (int i = 0; i < N; i++)
{
A(i) = .1;
d(i) = L_cell;
D(i) = k / d(i);
}
aL(0) = 0;
aR(0) = D(0) * A(0);
S_p(0) = -2 * D(0) * A(0);
aP(0) = aL(0) + aR(0) - S_p(0);
S_u(0) = 2 * D(0) * A(0) * T_A + S * L_cell * A(0);
for (int i = 1; i < N - 1; i++)
{
aL(i) = D(i) * A(i);
aR(i) = D(i) * A(i);
S_p(i) = 0;
aP(i) = aL(i) + aR(i) - S_p(i);
S_u(i) = S * A(i) * L_cell;
}
aL(N - 1) = D(N - 1) * A(N - 1);
aR(N - 1) = 0;
S_p(N - 1) = -2 * D(N - 1) * A(N - 1);
aP(N - 1) = aL(N - 1) + aR(N - 1) - S_p(N - 1);
S_u(N - 1) = 2 * D(N - 1) * A(N - 1) * T_B + S * L_cell * A(N - 1);
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(N * 3);
Matrix<double, N, 3> v; // v is declared here
v << (-1) * aL, aP, (-1) * aR;
for (int i = 0, j = 0; i < N && j < N; i++, j++)
{
tripletList.push_back(T(i, j, v(i, 1)));
if (i + 1 < N && j + 1 < N)
{
tripletList.push_back(T(i + 1, j, v(i + 1, 0)));
tripletList.push_back(T(i, j + 1, v(i, 2)));
}
}
SparseMatrix<double> coeff(N, N);
coeff.setFromTriplets(tripletList.begin(), tripletList.end());
SimplicialLDLT<SparseMatrix<double> > solver;
solver.compute(coeff);
if (solver.info() != Success) {
cout << "decomposition failed" << endl;
return;
}
Vector<double, N> temparature;
temparature = solver.solve(S_u);
if (solver.info() != Success)
{
cout << "Solving failed" << endl;
return;
}
vector<double> Te = {}, x = {};
Te.push_back(T_A);
x.push_back(0);
for (int i = 0; i < N; i++)
{
Te.push_back(temparature(i));
x.push_back(xp[i]);
}
Te.push_back(T_B);
x.push_back(L);
for (int i = 0; i < N + 2; i++)
{
cout << x[i] << " " << Te[i] << endl;
}
return 0;
}
Here is a full code of a solution to numerical problem which uses SparseMatrix. Look at the matrix v. It has the values of all the nonzero elements of coeff matrix yet to be defined. In the next loop I made a series of tripletList.push_back(...) adding a triplet consisting of row and column index and corresponding value taken from v for each non-zero element of coeff. Now declare a Sparse Matrix coeff with appropriate size and use the method setFromTriplets (documentation) to set its non-zero elements from tripletList triplets.

RcppParallel Parallelizing distance computation: segfault

I have a matrix, for which I want to compute the distance (let's say Euclidean) between the ith row and every other row(i.e. I want the ith row of the pairwise distance matrix).
#include <Rcpp.h>
#include <cmath>
#include <algorithm>
#include <RcppParallel.h>
//#include <RcppArmadillo.h>
#include <queue>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::export]]
double dist_fun(NumericVector row1, NumericVector row2){
double rval = 0;
for (int i = 0; i < row1.length(); i++){
rval += (row1[i] - row2[i]) * (row1[i] - row2[i]);
}
return rval;
}
// [[Rcpp::export]]
NumericVector dist_row(NumericMatrix mat, int i){
NumericVector row(mat.nrow());
NumericMatrix::Row row1 = mat.row(i - 1);
for (int j = 0; j < mat.nrow(); j++){
NumericMatrix::Row row2 = mat.row(j);
row(j) = dist_fun(row1, row2);
}
return row;
}
// [[Rcpp::depends(RcppParallel)]]
struct JsDistance: public Worker {
// input matrix to read from
const NumericMatrix mat;
int i;
// output vector to write to
NumericVector output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i), output(output) {}
// function call operator that work for the specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
NumericVector row1 = mat.row(i);
for (std::size_t j = begin; j < end; j++) {
NumericVector row2 = mat.row(j);
output[j] = dist_fun(row1, row2);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row(NumericMatrix mat, int i) {
// allocate the matrix we will return
NumericVector output(mat.nrow());
// create the worker
JsDistance JsDistance(mat, i, output);
// call it with parallelFor
parallelFor(0, mat.nrow(), JsDistance);
return output;
}
The sequential way using Rcpp is the function 'row_dist' as written above. Yet the matrix I want to work with is very large so I want to parallelize it. But then I will run into a segfault error which I don't quite understand why. To trigger the error you can run the following code:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 20)
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
sourceCpp("question.cpp")
start1 = proc.time()
print(dist_row(X, 2)[1:30])
print(proc.time() - start1)
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
print(proc.time() - start2)
Can someone give me some hint about what I did wrong? Thanks in advance for your time!
=======================================================================
Edit:
inline double d(double a, double b){
return fabs(a - b);
}
// [[Rcpp::depends(RcppParallel)]
struct dtwDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
dtwDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
size_t n = row1.length();
size_t m = row2.length();
NumericMatrix cost(n + 1, m + 1);
for (int ii = 1; ii <= n; ii++){
cost(i, 0) = numeric_limits<double>::infinity();
}
for (int jj = 1; jj <= m; jj++){
cost(0, j) = numeric_limits<double>::infinity();
}
for (int ii = 1; ii <= n; ii++){
for (int jj = 1; jj <= m; jj++){
double dist = d(row1[ii - 1], row2[jj - 1]);
cost(ii, jj) = dist + min(min(cost(ii - 1, jj), cost(ii, jj - 1)), cost(ii - 1, jj - 1));
//cout << ii << ", " << jj << ", " << cost(ii, jj) << "\n";
}
}
output[j] = cost(n, m);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row_dtw(NumericMatrix mat, int i) {
// allocate the matrix we will return
//RMatrix<double> input(mat);
NumericVector y(mat.nrow());
//RVector<double> output(y);
// create the worker
dtwDistance dtwDistance(mat, i, y);
// call it with parallelFor
parallelFor(0, mat.nrow(), dtwDistance);
return y;
}
The distance I needed to calculate is the dynamic time warping distance. I implemented it as above. Yet when running, it will give a 'stack imbalance' warning. And there will be a segfault after several runs. I'm wondering what is the problem now.
To trigger the problem, I did:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 4)
sourceCpp("scripts/chisq_dtw.cpp")
set.seed(42)
X = matrix(rnorm(1000), 100, 10)
parallel_dist_row_dtw(X, 1)
parallel_dist_row_dtw(X, 2)
parallel_dist_row_dtw(X, 3)
parallel_dist_row_dtw(X, 4)
parallel_dist_row_dtw(X, 5)
The issue is you are not using the thread-safe wrapper around R objects via RMatrix<T> and RVector<T>. These classes are important because of the parallelization being executed on a background thread, which is an area that is not safe to call R or Rcpp APIs. The official documentation emphasizes this in the Safe Accessors section.
In particular, we have:
To provide safe and convenient access to the arrays underlying R vectors and matrices RcppParallel introduces several accessor classes:
RVector<T> — Wrap R vectors of various types
RMatrix<T> — Wrap R matrices of various types (also includes Row and Column classes)
To create a thread safe accessor for an Rcpp vector or matrix just construct an instance of RVector or RMatrix with it.
Code Fix
So, your work can be fixed by switching *Matrix to RMatrix<T> and *Vector to RVector<T>.
struct JsDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
double rval = 0;
for (unsigned int k = 0; k < row1.length(); ++k) {
rval += (row1[k] - row2[k]) * (row1[k] - row2[k]);
}
output[j] = rval;
}
}
};
In particular, the data types used here are of the form RMatrix<double> even for accessing the matrix.
Also, within the parallelized version there is a missing i-1 statement. To remedy this, I've opted to have it taken care of in the constructor of JSDistance.
Test
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
start1 = proc.time()
print(dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start1)
# user system elapsed
# 0.22 0.00 0.23
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start2)
# user system elapsed
# 0.28 0.00 0.06
all.equal(parallel_dist_row(X, 2), dist_row(X, 2))
# [1] TRUE

Rcpp loop update variable inside

I am a new user of Rcpp and I am writing an package.
I have defined two functions in one script and try to call one from another in the loop.
One of my function defined as below:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
double timesTwo(colvec x, NumericVector group, double k,
NumericVector unique_group)
{
vec beta(x.begin(),x.size(),false);
vec Group(group.begin(),group.size(),false);
vec unigroup(unique_group.begin(),unique_group.size(),false);
beta = pow(beta,k);
int g = unigroup.size();
int j = 0;
uvec st ;
double b=0;
for(j = 0; j < g; j++)
{
st = find(Group == unigroup[j]);
b = b + abs(pow(sum(beta.elem(st)),1/k));
}
double s = b;
return s;
}
And the loop I use to to call this function is like below:
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update
// best!
pen_new_positive(i) = as<double>(timesTwo(z,group,k,unique_group));
My question is just like the comment I said in the loop, since I want to update that pen_new_postive(i) based on the BETA_NEW.col(2*i) However when I directly put BETA_NEW.COL(2*i) inside the timesTwo function, no matter how I change input
type of function (colvec or mat or whatever) there is error like below:
cannot convert "const::arma::subview_col<double> to "SEXP" in
initialization"
However when I directly use z in the timesTwo function, there is no update for my z in the loop.
Anyone could give me a hint about how to deal with this?
The full version of my code in second block as below:
#include <RcppArmadillo.h>
#include <math.h>
//#include <omp.h>
using namespace Rcpp;
using namespace arma;
//// [[Rcpp::plugins(openmp)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List minghan21041(NumericMatrix beta_new, NumericVector diff_loss,
NumericVector beta, double step_size, NumericVector y,
double k,
NumericVector group,
NumericVector unique_group,
NumericMatrix X) {
int n = X.nrow(), p=X.ncol() , n1=beta_new.nrow(), p1=beta_new.ncol();
mat X_MAT(X.begin(),n,p,false), BETA_NEW(beta_new.begin(),n1,p1,false);
vec BETA(beta.begin(),beta.size(),false);
vec Y(y.begin(),y.size(),false),
Diff_Loss(diff_loss.begin(),diff_loss.size(),false), iter(p,fill::zeros);
mat XtXi_beta_plus(n,p,fill::zeros);
mat XtXi_beta_minus(n,p,fill::zeros);
vec ited(n,fill::ones);
mat loss_new_1(n,p, fill::zeros),loss_new_2(n,p, fill::zeros);
colvec loss_new_3(n,fill::zeros);
vec new_loss(p1,fill::zeros);
uword index;
vec beta_final(p,fill::zeros);
vec pen_new_positive(p,fill::zeros);
vec pen_new_negative(p,fill::zeros);
double pen_old = 0;
Function timesTwo( "timesTwo" );
double cs=0;
//Col dc(BETA.begin(),p,1);
//pen_old = as<double>(timesTwo(dc,group,k,unique_group));
vec z(p,fill::zeros);
vec w(p,fill::zeros);
int i = 0;
vec XtXi_beta_old = X_MAT*BETA;
loss_new_3= (Y - ited / (ited + exp(-XtXi_beta_old)));
double loss_old_one = accu(loss_new_3%loss_new_3);
//#pragma omp parallel private(i) num_threads(4)
//{
//#pragma omp for ordered schedule(static,1)
for(i= 0; i<p; i++){
//#pragma omp ordered
//{
iter(i) = step_size;
BETA_NEW.col(2*i) = BETA + iter;
BETA_NEW.col(2*i+1) = BETA - iter;
XtXi_beta_plus.col(i) = X_MAT * BETA_NEW.col(2*i);
XtXi_beta_minus.col(i) = X_MAT * BETA_NEW.col(2*i+1);
loss_new_1.col(i) = (Y - ited / (ited + exp( -XtXi_beta_plus.col(i))));
loss_new_2.col(i) = (Y - ited / (ited + exp( -XtXi_beta_minus.col(i))));
new_loss(2*i) = accu(loss_new_1.col(i) % loss_new_1.col(i));
new_loss(2*i+1) = accu(loss_new_2.col(i) % loss_new_2.col(i));
z = BETA_NEW.col(2*i);
w = BETA_NEW.col(2*i+1);
// when 88 line was change to BETA_NEW.col(2*i) there is an error
// if you keep use Z, there is no update, you can source this file and I
//believe there is no other error
// best!
pen_new_positive(i) = as<double>(timesTwo(BETA_NEW.col(2*i),group,k,unique_group));
cs = pen_new_positive(i);
//Rcout << "cs" << cs << std::endl;
Rcout << "cs" << z << std::endl;
//pen_new_negative = as< std::vector<double> >(time(w,group,k,unique_group));
Diff_Loss(2*i) = new_loss(2*i) - loss_old_one + cs;
Diff_Loss(2*i+1) = new_loss(2*i+1) - loss_old_one + cs;
iter(i) = 0;
}
//}
//}
index = Diff_Loss.index_min();
beta_final = BETA_NEW.col(index);
return List::create( _["index"] = wrap(index),
_["Diff_Loss"]= wrap(Diff_Loss[index]),
_["ste"] =wrap(Diff_Loss),
_["beta_new"] = wrap(beta_final),
_["New_LOSS"]= wrap(new_loss[index]),
_["t"] = wrap(pen_new_positive));
}

Rcpp: my distance matrix program is slower than the function in package

I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows
NumericMatrix calcPWD1 (NumericMatrix x){
int outrows = x.nrow();
double d;
NumericMatrix out(outrows,outrows);
for (int i = 0 ; i < outrows - 1; i++){
for (int j = i + 1 ; j < outrows ; j ++){
NumericVector v1= x.row(i);
NumericVector v2= x.row(j);
NumericVector v3=v1-v2;
d = sqrt(sum(pow(v3,2)));
out(j,i)=d;
out(i,j)=d;
}
}
return (out) ;
}
But I find my program is slower than dist function.
> benchmark(as.matrix(dist(b)),calcPWD1(b))
test replications elapsed relative user.self sys.self user.child sys.child
1 as.matrix(dist(b)) 100 24.831 1.000 24.679 0.010 0 0
2 calcPWD1(b) 100 27.362 1.102 27.346 0.007 0 0
Do you guys have any suggestion? My matrix is very simple. There is no column names or row names, just plain matrix (for example like b=matrix(c(rnorm(1000*10)),1000,10)).
Here is the program of dist
> dist
function (x, method = "euclidean", diag = FALSE, upper = FALSE,
p = 2)
{
if (!is.na(pmatch(method, "euclidian")))
method <- "euclidean"
METHODS <- c("euclidean", "maximum", "manhattan", "canberra",
"binary", "minkowski")
method <- pmatch(method, METHODS)
if (is.na(method))
stop("invalid distance method")
if (method == -1)
stop("ambiguous distance method")
x <- as.matrix(x)
N <- nrow(x)
attrs <- if (method == 6L)
list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], p = p, call = match.call(),
class = "dist")
else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag,
Upper = upper, method = METHODS[method], call = match.call(),
class = "dist")
.Call(C_Cdist, x, method, attrs, p)
}
<bytecode: 0x56b0d40>
<environment: namespace:stats>
I expect my program is faster than dist since in dist, there are too many thing to need to be checked (like method, diag).
Rcpp vs. Internal R Functions (C/Fortran)
First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.
Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.
Base Implementation of dist()
Secondly, the distance calculation R uses is done in C as indicated by:
.Call(C_Cdist, x, method, attrs, p)
, which is the last line of the dist() function's R source. This gives it a slight advantage vs. C++ as it more granular instead of templated.
Furthermore, the C implementation uses OpenMP when available to parallelize the computation.
Proposed modification
Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
unsigned int outrows = x.nrow(), i = 0, j = 0;
double d;
Rcpp::NumericMatrix out(outrows,outrows);
for (i = 0; i < outrows - 1; i++){
Rcpp::NumericVector v1 = x.row(i);
for (j = i + 1; j < outrows ; j ++){
d = sqrt(sum(pow(v1-x.row(j), 2.0)));
out(j,i)=d;
out(i,j)=d;
}
}
return out;
}
You were almost there. But your inner loop body tried to do too much in one line. Template programming is hard enough as it is, and sometimes it is just better to spread instructions out a little to give the compiler a better chance. So I just made it five statements, and built immediatelt.
New code:
#include <Rcpp.h>
using namespace Rcpp;
double dist1 (NumericVector x, NumericVector y){
int n = y.length();
double total = 0;
for (int i = 0; i < n ; ++i) {
total += pow(x(i)-y(i),2.0);
}
total = sqrt(total);
return total;
}
// [[Rcpp::export]]
NumericMatrix calcPWD (NumericMatrix x){
int outrows = x.nrow();
int outcols = x.nrow();
NumericMatrix out(outrows,outcols);
for (int i = 0 ; i < outrows - 1; i++){
for (int j = i + 1 ; j < outcols ; j ++) {
NumericVector v1 = x.row(i);
NumericVector v2 = x.row(j-1);
double d = dist1(v1, v2);
out(j-1,i) = d;
out(i,j-1)= d;
}
}
return (out) ;
}
/*** R
M <- matrix(log(1:9), 3, 3)
calcPWD(M)
*/
Running it:
R> sourceCpp("/tmp/mikebrown.cpp")
R> M <- matrix(log(1:9), 3, 3)
R> calcPWD(M)
[,1] [,2] [,3]
[1,] 0.000000 0.740322 0
[2,] 0.740322 0.000000 0
[3,] 0.000000 0.000000 0
R>
You may want to check your indexing logic though. Looks like you missed more comparisons.
Edit: For kicks, here is a more compact version of your distance function:
// [[Rcpp::export]]
double dist2(NumericVector x, NumericVector y){
double d = sqrt( sum( pow(x - y, 2) ) );
return d;
}