no matching function for call in rcpp - c++

When using Rcpp,I create a function named rpois_rcpp and l try to call it below in genDataList function, an error occurs and said :
"no matching function for call to 'cpprbinom',
candidate function not viable: no known conversion from 'arma::vec' (aka 'Col') to 'Rcpp::NumericVector' (aka 'Vector<14>') for 3rd argument
arma::vec cpprbinom(int n, double size, NumericVector prob).
Can someone help me ,thanks!
Here is my code:
//create a random matrix X with covariance matrix sigma
// [[Rcpp::export]]
arma::mat mvrnormArma(const int n, arma::vec mu, const int p, const
double rho) {
arma::mat sigma(p, p, arma::fill::zeros);
for (int i = 0; i < sigma.n_rows; ++i) {
for (int j = 0; j < sigma.n_cols; ++j) {
sigma(i,j) = pow(rho, abs((i + 1) - (j + 1)));
}
}
int ncols = sigma.n_cols;
arma::mat Y = arma::randn(n, ncols);
return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma);
}
//create a vector sampled from poisson distribution with mean vector
//lambda
// [[Rcpp::export]]
arma::vec rpois_rcpp( NumericVector &lambda) {
int n= lambda.length();
unsigned int lambda_i = 0;
IntegerVector sim(n);
for (unsigned int i = 0; i < n; i++) {
sim[i] = R::rpois(lambda[lambda_i]);
// update lambda_i to match next realized value with correct mean
lambda_i++;
}
return as<arma::vec>(sim);
}
//create a vector sampled from binomial distribution with probability
vector prob
// [[Rcpp::export]]
arma::vec cpprbinom(int n, double size, NumericVector prob) {
NumericVector v = no_init(n);
std::transform( prob.begin(), prob.end(), v.begin(), [=](double p){
return R::rbinom(size, p); });
return as<arma::vec>(v);}
// [[Rcpp::export]]44
List genDataList(int n, arma::vec& mu, int p, double rho,
arma::vec& beta, const double SNR, const std::string &
Test_case) {
arma::mat U, V, data, normData, Projection;
arma::vec s, y, means, noise;
data = mvrnormArma(n, mu, p, rho);
normData = arma::normalise(data,2,0);
arma::svd_econ(U,s,V,normData,"right");
Projection = V * trans(V);
beta = Projection * beta;
if(Test_case == "gaussian")
{
means=normData * beta;
y = means + arma::randn(n) * sqrt(arma::var(means) / SNR);}
else if (Test_case == "poisson")
{
means=exp(normData * beta);
y = rpois_rcpp(means);}
else
{
means=exp(normData * beta)/(1 + exp(normData * beta));
y = cpprbinom(n,1,means);}
List ret;
ret["data"] = data;
ret["normData"] = normData;
ret["V"] = V;
ret["beta"] = beta;
ret["y"] = y;
return ret;
}

Thanks for adding your code. When I tried to compile, I got the same error as you, but also an error for the line calling rpois_rcpp()
invalid initialization of reference to type 'Rcpp::NumericVector&'
Pretty much everything seems to be in arma, except the R bindings and calls to the R:: namespace, which takes doubles, ints, etc. It seems the easiest thing to do (to my mind), is just take arma::vec as arguments instead:
arma::vec rpois_rcpp( arma::vec &lambda) {
int n= lambda.n_elem;
and
arma::vec cpprbinom(int n, double size, arma::vec prob) {
You never utilize the fact that lambda and prob are Rcpp::NumericVectors specifically, you just use doubles from them, so this seems the easiest route to me. After those changes, your code compiles fine on my machine. I don't have any test cases to make sure they run as you'd expect, but I imagine they will.

Related

Any tips for how to speed up this Rcpp / C++ code?

I am new to Rcpp and am looking to make the following code as fast as possible. Any tips would be greatly appreciated.
It is running a lot slower than I would like.
I have tried vectorising as much as possible but i'm not sure how to vectorise further
// [[Rcpp::export]]
arma::mat fn_update_log_posterior(arma::mat current_logdiffs_set_mat,
int K,
int N,
int l_class,
int ord_test,
int n_bin_tests,
arma::mat first_cutpoint_mat,
arma::mat prev,
arma::cube prob_test,
arma::mat prob,
arma::vec class_ind,
arma::cube Xbeta,
double prior_ind_dir,
arma::mat yp1,
arma::mat y,
double prior_densities) {
arma::mat current_cutpoints_set_full = fn_calculate_cutpoints(current_logdiffs_set_mat, first_cutpoint_mat, K);
arma::mat log_prob(1,1);
arma::vec lp(N);
arma::vec lower_ord_inv_prob(N);
arma::vec upper_ord_inv_prob(N);
arma::vec lower_ord_prob(N);
arma::vec upper_ord_prob(N);
arma::mat prob_test_n(prob_test.n_cols, prob_test.n_slices);
for (int n = 0; n < N;++ n) {
upper_ord_inv_prob.at(n) = current_cutpoints_set_full.at(yp1.at(n, ord_test)) - Xbeta.at(l_class , n , ord_test);
lower_ord_inv_prob.at(n) = current_cutpoints_set_full.at(y.at(n, ord_test)) - Xbeta.at(l_class, n , ord_test);
}
upper_ord_prob = fn_Phi_approx_vec_2(upper_ord_inv_prob);
lower_ord_prob = fn_Phi_approx_vec_2(lower_ord_inv_prob);
for (int n = 0; n < N; ++n) {
prob_test.at(n, ord_test, l_class) = log( upper_ord_prob(n) - lower_ord_prob(n) );
prob_test_n = prob_test.row(n);
prob.at(n,l_class) = sum(prob_test_n.col(l_class)) + log(prev.at(0, l_class));
lp.at(n) = prob.at(n,class_ind(n) - 1); // works for any number of classes
}
log_prob(0,0) = sum(lp) + prior_densities;
return(log_prob);
}

How do I implement the numerical differentiation (f'(x) = f(x+h)-f(x)/ h

2nd task:
For a function f : R^n → R the gradient at a point ~x ∈ R^n is to be calculated:
- Implement a function
CMyVector gradient(CMyVector x, double (*function)(CMyVector x)),
which is given in the first parameter the location ~x and in the second parameter the function f as function pointer in the second parameter, and which calculates the gradient ~g = grad f(~x) numerically
by
gi = f(x1, . . . , xi-1, xi + h, xi+1 . . . , xn) - f(x1, . . . , xn)/h
to fixed h = 10^-8.
My currently written program:
Header
#pragma once
#include <vector>
#include <math.h>
class CMyVektor
{
private:
/* data */
int Dimension = 0;
std::vector<double>Vector;
public:
CMyVektor();
~CMyVektor();
//Public Method
void set_Dimension(int Dimension /* Aktuelle Dim*/);
void set_specified_Value(int index, int Value);
double get_specified_Value(int key);
int get_Vector_Dimension();
int get_length_Vektor();
double& operator [](int index);
string umwandlung()
};
CMyVektor::CMyVektor(/* args */)
{
Vector.resize(0, 0);
}
CMyVektor::~CMyVektor()
{
for (size_t i = 0; i < Vector.size(); i++)
{
delete Vector[i];
}
}
void CMyVektor::set_Dimension(int Dimension /* Aktuelle Dim*/)
{
Vector.resize(Dimension);
};
void CMyVektor::set_specified_Value(int index, int Value)
{
if (Vector.empty())
{
Vector.push_back(Value);
}
else {
Vector[index] = Value;
}
};
double CMyVektor::get_specified_Value(int key)
{
// vom intervall anfang - ende des Vectors
for (unsigned i = 0; i < Vector.size(); i++)
{
if (Vector[i] == key) {
return Vector[i];
}
}
};
int CMyVektor::get_Vector_Dimension()
{
return Vector.size();
};
// Berechnet den Betrag "länge" eines Vectors.
int CMyVektor::get_length_Vektor()
{
int length = 0;
for (size_t i = 0; i < Vector.size(); i++)
{
length += Vector[i]^2
}
return sqrt(length);
}
// [] Operator überladen
double& CMyVektor::operator [](int index)
{
return Vector[index];
}
main.cpp
#include <iostream>
#include "ClassVektor.h"
using namespace std;
CMyVektor operator+(CMyVektor a, CMyVektor b);
CMyVektor operator*(double lambda, CMyVektor a);
CMyVektor gradient(CMyVektor x, double (*funktion)(CMyVektor x));
int main() {
CMyVektor V1;
CMyVektor V2;
CMyVektor C;
C.set_Dimension(V1.get_length_Vector());
C= V1 + V2;
std::cout << "Addition : "<< "(";;
for (int i = 0; i < C.get_length_Vector(); i++)
{
std::cout << C[i] << " ";
}
std::cout << ")" << endl;
C = lamda * C;
std::cout << "Skalarprodukt: "<< C[0]<< " ";
}
// Vector Addition
CMyVektor operator+(CMyVektor a, CMyVektor b)
{
int ai = 0, bi = 0;
int counter = 0;
CMyVektor c;
c.set_Dimension(a.get_length_Vector());
// Wenn Dimension Gleich dann addition
if (a.get_length_Vector() == b.get_length_Vector())
{
while (counter < a.get_length_Vector())
{
c[counter] = a[ai] + b[bi];
counter++;
}
return c;
}
}
//Berechnet das Skalarprodukt
CMyVektor operator*(double lambda, CMyVektor a)
{
CMyVektor c;
c.set_Dimension(1);
for (unsigned i = 0; i < a.get_length_Vector(); i++)
{
c[0] += lambda * a[i];
}
return c;
}
/*
* Differenzenquotient : (F(x0+h)+F'(x0)) / h
* Erster Parameter die Stelle X - Zweiter Parameter die Funktion
* Bestimmt numerisch den Gradienten.
*/
CMyVektor gradient(CMyVektor x, double (*funktion)(CMyVektor x))
{
}
My problem now is that I don't quite know how to deal with the
CMyVector gradient(CMyVector x, double (*function)(CMyVector x))
function and how to define a function that corresponds to it.
I hope that it is enough information. Many thanks.
The function parameter is the f in the difference formula. It takes a CMyVector parameter x and returns a double value. You need to supply a function parameter name. I'll assume func for now.
I don't see a parameter for h. Are you going to pass a single small value into the gradient function or assume a constant?
The parameter x is a vector. Will you add a constant h to each element?
This function specification is a mess.
Function returns a double. How do you plan to turn that into a vector?
No wonder you're confused. I am.
Are you trying to do something like this?
You are given a function signature
CMyVector gradient(CMyVector x, double (*function)(CMyVector x))
Without knowing the exact definition I will assume, that at least the basic numerical vector operations are defined. That means, that the following statements compile:
CMyVector x {2.,5.,7.};
CMyVector y {1.,7.,4.};
CMyVector z {0.,0.,0.};
double a = 0.;
// vector addition and assigment
z = x + y;
// vector scalar multiplication and division
z = z * a;
z = x / 0.1;
Also we need to know the dimension of the CMyVector class. I assumed and will continue to do so that it is three dimensional.
The next step is to understand the function signature. You get two parameters. The first one denotes the point, at which you are supposed to calculate the gradient. The second is a pointer to the function f in your formula. You do not know it, but can call it on a vector from within your gradient function definition. That means, inside of the definition you can do something like
double f_at_x = function(x);
and the f_at_x will hold the value f(x) after that operation.
Armed with this, we can try to implement the formula, that you mentioned in the question title:
CMyVector gradient(CMyVector x, double (*function)(CMyVector x)) {
double h = 0.001;
// calculate first element of the gradient
CMyVector e1 {1.0, 0.0, 0.0};
double result1 = ( function(x + e1*h) - function(x) )/h;
// calculate second element of the gradient
CMyVector e2 {0.0, 1.0, 0.0};
double result2 = ( function(x + e2*h) - function(x) )/h;
// calculate third element of the gradient
CMyVector e3 {0.0, 0.0, 1.0};
double result3 = ( function(x + e3*h) - function(x) )/h;
// return the result
return CMyVector {result1, result2, result3};
}
There are several thing worth to mention in this code. First and most important I have chosen h = 0.001. This may like a very arbitrary choice, but the choice of the step size will very much impact the precision of your result. You can find a whole lot of discussion about that topic here. I took the same value that according to that wikipedia page a lot of handheld calculators use internally. That might not be the best choice for the floating point precision of your processor, but should be a fair one to start with.
Secondly the code looks very ugly for an advanced programmer. We are doing almost the same thing for each of the three dimensions. Ususally you would like to do that in a for loop. The exact way of how this is done depends on how the CMyVector type is defined.
Since the CMyVektor is just rewritting the valarray container, I will directly use the valarray:
#include <iostream>
#include <valarray>
using namespace std;
using CMyVektor = valarray<double>;
CMyVektor gradient(CMyVektor x, double (*funktion)(CMyVektor x));
const double h = 0.00000001;
int main()
{
// sum(x_i^2 + x_i)--> gradient: 2*x_i + 1
auto fun = [](CMyVektor x) {return (x*x + x).sum();};
CMyVektor d = gradient(CMyVektor{1,2,3,4,5}, fun);
for (auto i: d) cout << i<<' ';
return 0;
}
CMyVektor gradient(CMyVektor x, double (*funktion)(CMyVektor x)){
CMyVektor grads(x.size());
CMyVektor pos(x.size());
for (int i = 0; i<x.size(); i++){
pos[i] = 1;
grads[i] = (funktion(x + h * pos) - funktion(x))/ h;
pos[i] = 0;
}
return grads;
}
The prints out 3 5 7 9 11 which is what is expected from the given function and the given location

Pass a function as argument, without knowlegde of number of arguments of this function [duplicate]

long time browser, first time asker here. I've written a number of scripts for doing various 1D numerical integration methods and compiled them into a library. I would like that library to be as flexible as possible regarding what it is capable of integrating.
Here I include an example: a very simple trapezoidal rule example where I pass a pointer to the function to be integrated.
// Numerically integrate (*f) from a to b
// using the trapezoidal rule.
double trap(double (*f)(double), double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += (*f)(xi); }
else { s += 2*(*f)(xi); }
}
s *= (b-a)/(2*N);
return s;
}
This works great for simple functions that only take one argument. Example:
double a = trap(sin,0,1);
However, sometimes I may want to integrate something that has more parameters, like a quadratic polynomial. In this example, the coefficients would be defined by the user before the integration. Example code:
// arbitrary quadratic polynomial
double quad(double A, double B, double C, double x) {
return (A*pow(x,2) + B*x + C);
}
Ideally, I would be able to do something like this to integrate it:
double b = trap(quad(1,2,3),0,1);
But clearly that doesn't work. I have gotten around this problem by defining a class that has the coefficients as members and the function of interest as a member function:
class Model {
double A,B,C;
public:
Model() { A = 0; B = 0; C = 0; }
Model(double x, double y, double z) { A = x; B = y; C = z; }
double func(double x) { return (A*pow(x,2)+B*x+C); }
};
However, then my integration function needs to change to take an object as input instead of a function pointer:
// Numerically integrate model.func from a to b
// using the trapezoidal rule.
double trap(Model poly, double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += poly.func(xi); }
else { s += 2*poly.func(xi); }
}
s *= (b-a)/(2*N);
return s;
}
This works fine, but the resulting library is not very independent, since it needs the class Model to be defined somewhere. Also, ideally the Model should be able to change from user-to-user so I wouldn't want to fix it in a header file. I have tried to use function templates and functors to get this to work but it is not very independent since again, the template should be defined in a header file (unless you want to explicitly instantiate, which I don't).
So, to sum up: is there any way I can get my integration functions to accept arbitrary 1D functions with a variable number of input parameters while still remaining independent enough that they can be compiled into a stand-alone library? Thanks in advance for the suggestions.
What you need is templates and std::bind() (or its boost::bind() counterpart if you can't afford C++11). For instance, this is what your trap() function would become:
template<typename F>
double trap(F&& f, double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += f(xi); }
// ^
else { s += 2* f(xi); }
// ^
}
s *= (b-a)/(2*N);
return s;
}
Notice, that we are generalizing from function pointers and allow any type of callable objects (including a C++11 lambda, for instance) to be passed in. Therefore, the syntax for invoking the user-provided function is not *f(param) (which only works for function pointers), but just f(param).
Concerning the flexibility, let's consider two hardcoded functions (and pretend them to be meaningful):
double foo(double x)
{
return x * 2;
}
double bar(double x, double y, double z, double t)
{
return x + y * (z - t);
}
You can now provide both the first function directly in input to trap(), or the result of binding the last three arguments of the second function to some particular value (you have free choice on which arguments to bind):
#include <functional>
int main()
{
trap(foo, 0, 42);
trap(std::bind(bar, std::placeholders::_1, 42, 1729, 0), 0, 42);
}
Of course, you can get even more flexibility with lambdas:
#include <functional>
#include <iostream>
int main()
{
trap(foo, 0, 42);
trap(std::bind(bar, std::placeholders::_1, 42, 1729, 0), 0, 42);
int x = 1729; // Or the result of some computation...
int y = 42; // Or some particular state information...
trap([&] (double d) -> double
{
x += 42 * d; // Or some meaningful computation...
y = 1; // Or some meaningful operation...
return x;
}, 0, 42);
std::cout << y; // Prints 1
}
And you can also pass your own stateful functors tp trap(), or some callable objects wrapped in an std::function object (or boost::function if you can't afford C++11). The choice is pretty wide.
Here is a live example.
What you trying to do is to make this possible
trap( quad, 1, 2, 3, 0, 1 );
With C++11 we have alias template and variadic template
template< typename... Ts >
using custom_function_t = double (*f) ( double, Ts... );
above define a custom_function_t that take a double and variable numbers of arguments.
so your trap function becomes
template< typename... Ts >
double trap( custom_function_t<Ts...> f, Ts... args, double a, double b ) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += f(xi, args...); }
else { s += 2*f(xi, args...); }
}
s *= (b-a)/(2*N);
return s;
}
Usage:
double foo ( double X ) {
return X;
}
double quad( double X, double A, double B, double C ) {
return(A*pow(x,2) + B*x + C);
}
int main() {
double result_foo = trap( foo, 0, 1 );
double result_quad = trap( quad, 1, 2, 3, 0, 1 ); // 1, 2, 3 == A, B, C respectively
}
Tested on Apple LLVM 4.2 compiler.

RcppParallel Parallelizing distance computation: segfault

I have a matrix, for which I want to compute the distance (let's say Euclidean) between the ith row and every other row(i.e. I want the ith row of the pairwise distance matrix).
#include <Rcpp.h>
#include <cmath>
#include <algorithm>
#include <RcppParallel.h>
//#include <RcppArmadillo.h>
#include <queue>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::export]]
double dist_fun(NumericVector row1, NumericVector row2){
double rval = 0;
for (int i = 0; i < row1.length(); i++){
rval += (row1[i] - row2[i]) * (row1[i] - row2[i]);
}
return rval;
}
// [[Rcpp::export]]
NumericVector dist_row(NumericMatrix mat, int i){
NumericVector row(mat.nrow());
NumericMatrix::Row row1 = mat.row(i - 1);
for (int j = 0; j < mat.nrow(); j++){
NumericMatrix::Row row2 = mat.row(j);
row(j) = dist_fun(row1, row2);
}
return row;
}
// [[Rcpp::depends(RcppParallel)]]
struct JsDistance: public Worker {
// input matrix to read from
const NumericMatrix mat;
int i;
// output vector to write to
NumericVector output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i), output(output) {}
// function call operator that work for the specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
NumericVector row1 = mat.row(i);
for (std::size_t j = begin; j < end; j++) {
NumericVector row2 = mat.row(j);
output[j] = dist_fun(row1, row2);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row(NumericMatrix mat, int i) {
// allocate the matrix we will return
NumericVector output(mat.nrow());
// create the worker
JsDistance JsDistance(mat, i, output);
// call it with parallelFor
parallelFor(0, mat.nrow(), JsDistance);
return output;
}
The sequential way using Rcpp is the function 'row_dist' as written above. Yet the matrix I want to work with is very large so I want to parallelize it. But then I will run into a segfault error which I don't quite understand why. To trigger the error you can run the following code:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 20)
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
sourceCpp("question.cpp")
start1 = proc.time()
print(dist_row(X, 2)[1:30])
print(proc.time() - start1)
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
print(proc.time() - start2)
Can someone give me some hint about what I did wrong? Thanks in advance for your time!
=======================================================================
Edit:
inline double d(double a, double b){
return fabs(a - b);
}
// [[Rcpp::depends(RcppParallel)]
struct dtwDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
dtwDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
size_t n = row1.length();
size_t m = row2.length();
NumericMatrix cost(n + 1, m + 1);
for (int ii = 1; ii <= n; ii++){
cost(i, 0) = numeric_limits<double>::infinity();
}
for (int jj = 1; jj <= m; jj++){
cost(0, j) = numeric_limits<double>::infinity();
}
for (int ii = 1; ii <= n; ii++){
for (int jj = 1; jj <= m; jj++){
double dist = d(row1[ii - 1], row2[jj - 1]);
cost(ii, jj) = dist + min(min(cost(ii - 1, jj), cost(ii, jj - 1)), cost(ii - 1, jj - 1));
//cout << ii << ", " << jj << ", " << cost(ii, jj) << "\n";
}
}
output[j] = cost(n, m);
}
}
};
// [[Rcpp::export]]
NumericVector parallel_dist_row_dtw(NumericMatrix mat, int i) {
// allocate the matrix we will return
//RMatrix<double> input(mat);
NumericVector y(mat.nrow());
//RVector<double> output(y);
// create the worker
dtwDistance dtwDistance(mat, i, y);
// call it with parallelFor
parallelFor(0, mat.nrow(), dtwDistance);
return y;
}
The distance I needed to calculate is the dynamic time warping distance. I implemented it as above. Yet when running, it will give a 'stack imbalance' warning. And there will be a segfault after several runs. I'm wondering what is the problem now.
To trigger the problem, I did:
library(Rcpp)
library(RcppParallel)
setThreadOptions(numThreads = 4)
sourceCpp("scripts/chisq_dtw.cpp")
set.seed(42)
X = matrix(rnorm(1000), 100, 10)
parallel_dist_row_dtw(X, 1)
parallel_dist_row_dtw(X, 2)
parallel_dist_row_dtw(X, 3)
parallel_dist_row_dtw(X, 4)
parallel_dist_row_dtw(X, 5)
The issue is you are not using the thread-safe wrapper around R objects via RMatrix<T> and RVector<T>. These classes are important because of the parallelization being executed on a background thread, which is an area that is not safe to call R or Rcpp APIs. The official documentation emphasizes this in the Safe Accessors section.
In particular, we have:
To provide safe and convenient access to the arrays underlying R vectors and matrices RcppParallel introduces several accessor classes:
RVector<T> — Wrap R vectors of various types
RMatrix<T> — Wrap R matrices of various types (also includes Row and Column classes)
To create a thread safe accessor for an Rcpp vector or matrix just construct an instance of RVector or RMatrix with it.
Code Fix
So, your work can be fixed by switching *Matrix to RMatrix<T> and *Vector to RVector<T>.
struct JsDistance: public Worker {
// Input matrix to read from must be of the RMatrix<T> form
// if using Rcpp objects
const RMatrix<double> mat;
int i;
// Output vector to write to must be of the RVector<T> form
// if using Rcpp objects
RVector<double> output;
// initialize from Rcpp input and output matrixes (the RMatrix class
// can be automatically converted to from the Rcpp matrix type)
JsDistance(const NumericMatrix mat, int i, NumericVector output)
: mat(mat), i(i - 1), output(output) {}
// Note the -1 ^^^^ to match results from prior function
// Function call operator to iterate over a specified range (begin/end)
void operator()(std::size_t begin, std::size_t end) {
RMatrix<double>::Row row1 = mat.row(i);
for (std::size_t j = begin; j < end; ++j) {
RMatrix<double>::Row row2 = mat.row(j);
double rval = 0;
for (unsigned int k = 0; k < row1.length(); ++k) {
rval += (row1[k] - row2[k]) * (row1[k] - row2[k]);
}
output[j] = rval;
}
}
};
In particular, the data types used here are of the form RMatrix<double> even for accessing the matrix.
Also, within the parallelized version there is a missing i-1 statement. To remedy this, I've opted to have it taken care of in the constructor of JSDistance.
Test
set.seed(42)
X = matrix(rnorm(10000 * 400), 10000, 400)
start1 = proc.time()
print(dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start1)
# user system elapsed
# 0.22 0.00 0.23
start2 = proc.time()
print(parallel_dist_row(X, 2)[1:30])
# [1] 811.8873 0.0000 799.8153 810.1442 720.3232 730.6083 797.8441 781.8066 827.1511 834.1863 842.9392 850.2476 724.5842 673.1428 775.0994
# [16] 805.5752 804.9281 774.9770 799.7669 870.3187 815.1129 934.7581 726.1554 804.2097 758.4943 772.8931 806.6026 715.8257 847.8980 831.7555
print(proc.time() - start2)
# user system elapsed
# 0.28 0.00 0.06
all.equal(parallel_dist_row(X, 2), dist_row(X, 2))
# [1] TRUE

C++: pass function with arbitrary number of parameters as a parameter

long time browser, first time asker here. I've written a number of scripts for doing various 1D numerical integration methods and compiled them into a library. I would like that library to be as flexible as possible regarding what it is capable of integrating.
Here I include an example: a very simple trapezoidal rule example where I pass a pointer to the function to be integrated.
// Numerically integrate (*f) from a to b
// using the trapezoidal rule.
double trap(double (*f)(double), double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += (*f)(xi); }
else { s += 2*(*f)(xi); }
}
s *= (b-a)/(2*N);
return s;
}
This works great for simple functions that only take one argument. Example:
double a = trap(sin,0,1);
However, sometimes I may want to integrate something that has more parameters, like a quadratic polynomial. In this example, the coefficients would be defined by the user before the integration. Example code:
// arbitrary quadratic polynomial
double quad(double A, double B, double C, double x) {
return (A*pow(x,2) + B*x + C);
}
Ideally, I would be able to do something like this to integrate it:
double b = trap(quad(1,2,3),0,1);
But clearly that doesn't work. I have gotten around this problem by defining a class that has the coefficients as members and the function of interest as a member function:
class Model {
double A,B,C;
public:
Model() { A = 0; B = 0; C = 0; }
Model(double x, double y, double z) { A = x; B = y; C = z; }
double func(double x) { return (A*pow(x,2)+B*x+C); }
};
However, then my integration function needs to change to take an object as input instead of a function pointer:
// Numerically integrate model.func from a to b
// using the trapezoidal rule.
double trap(Model poly, double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += poly.func(xi); }
else { s += 2*poly.func(xi); }
}
s *= (b-a)/(2*N);
return s;
}
This works fine, but the resulting library is not very independent, since it needs the class Model to be defined somewhere. Also, ideally the Model should be able to change from user-to-user so I wouldn't want to fix it in a header file. I have tried to use function templates and functors to get this to work but it is not very independent since again, the template should be defined in a header file (unless you want to explicitly instantiate, which I don't).
So, to sum up: is there any way I can get my integration functions to accept arbitrary 1D functions with a variable number of input parameters while still remaining independent enough that they can be compiled into a stand-alone library? Thanks in advance for the suggestions.
What you need is templates and std::bind() (or its boost::bind() counterpart if you can't afford C++11). For instance, this is what your trap() function would become:
template<typename F>
double trap(F&& f, double a, double b) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += f(xi); }
// ^
else { s += 2* f(xi); }
// ^
}
s *= (b-a)/(2*N);
return s;
}
Notice, that we are generalizing from function pointers and allow any type of callable objects (including a C++11 lambda, for instance) to be passed in. Therefore, the syntax for invoking the user-provided function is not *f(param) (which only works for function pointers), but just f(param).
Concerning the flexibility, let's consider two hardcoded functions (and pretend them to be meaningful):
double foo(double x)
{
return x * 2;
}
double bar(double x, double y, double z, double t)
{
return x + y * (z - t);
}
You can now provide both the first function directly in input to trap(), or the result of binding the last three arguments of the second function to some particular value (you have free choice on which arguments to bind):
#include <functional>
int main()
{
trap(foo, 0, 42);
trap(std::bind(bar, std::placeholders::_1, 42, 1729, 0), 0, 42);
}
Of course, you can get even more flexibility with lambdas:
#include <functional>
#include <iostream>
int main()
{
trap(foo, 0, 42);
trap(std::bind(bar, std::placeholders::_1, 42, 1729, 0), 0, 42);
int x = 1729; // Or the result of some computation...
int y = 42; // Or some particular state information...
trap([&] (double d) -> double
{
x += 42 * d; // Or some meaningful computation...
y = 1; // Or some meaningful operation...
return x;
}, 0, 42);
std::cout << y; // Prints 1
}
And you can also pass your own stateful functors tp trap(), or some callable objects wrapped in an std::function object (or boost::function if you can't afford C++11). The choice is pretty wide.
Here is a live example.
What you trying to do is to make this possible
trap( quad, 1, 2, 3, 0, 1 );
With C++11 we have alias template and variadic template
template< typename... Ts >
using custom_function_t = double (*f) ( double, Ts... );
above define a custom_function_t that take a double and variable numbers of arguments.
so your trap function becomes
template< typename... Ts >
double trap( custom_function_t<Ts...> f, Ts... args, double a, double b ) {
int N = 10000;
double step = (b-a)/N;
double s = 0;
for (int i=0; i<=N; i++) {
double xi = a + i*step;
if (i == 0 || i == N) { s += f(xi, args...); }
else { s += 2*f(xi, args...); }
}
s *= (b-a)/(2*N);
return s;
}
Usage:
double foo ( double X ) {
return X;
}
double quad( double X, double A, double B, double C ) {
return(A*pow(x,2) + B*x + C);
}
int main() {
double result_foo = trap( foo, 0, 1 );
double result_quad = trap( quad, 1, 2, 3, 0, 1 ); // 1, 2, 3 == A, B, C respectively
}
Tested on Apple LLVM 4.2 compiler.