Concatenate and Lists in Rcpp - c++

I am just getting starting with Rcpp so this might be a very stupid question. Here is the specific question (context is provided below)
What is the Rcpp equivalent of
odes <- c(A = 1.0, B = 2.0, C = 3.0, D = 4.0, E = 5.0, F = 6.0, G = 7.0)
list(odes)
Context - I am trying to solve a system of Ordinary Differential Equation (ODEs) using the deSolve package's vode solver, but using Rcpp package to write the right hand side of ODEs in a compiled code. The solver expects the function which forms the RHS of ODEs to return a list, specifically in this case the RHS from a .R function (which the solver was able to intergrate successfully) was of the form
> X
[[1]]
9000000.00 -9000000.00 0.00 19993.04 -19993.04 -19993.04 -9000000.00
and I want my .cpp file to spit out odes as a list of similar form.
Any help here would be much appreciated!!
As suggested below, I am pasting the code to show exactly what I am doing
#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar). Learn
// more about Rcpp at:
//
// http://www.rcpp.org/
// http://adv-r.had.co.nz/Rcpp.html
// http://gallery.rcpp.org/
//
// [[Rcpp::export]]
List odes_gprotein(double t, NumericVector A, NumericVector p) {
NumericVector odes_vec(A.length());
List odes(1);
double Flux1 = p[1] * A[4] * A[5] - p[0] * A[3];
double Flux2 = p[2] * A[5] - p[3];
double Flux3 = p[4] * A[3];
double Flux4 = p[5] * A[1] * A[6];
double Flux5 = p[6] * A[0] * A[3];
double Flux6 = p[7] * A[2];
odes_vec[0] = (Flux4 - Flux5);
odes_vec[1] = (-Flux4 + Flux6);
odes_vec[2] = (Flux5 - Flux6);
odes_vec[3] = (Flux1 - Flux3);
odes_vec[4] = (-Flux1);
odes_vec[5] = (-Flux1 - Flux2);
odes_vec[6] = (-Flux4 + Flux5);
odes = List(odes_vec);
return odes;
}
This function returns (when I supply some value of t, p and A) the following,
> Rcpp::sourceCpp('odes_gprotein.cpp')
> X <- odes_gprotein(0,IC,p)
> str(X)
List of 7
$ : num 9e+06
$ : num -9e+06
$ : num 0
$ : num 19993
$ : num -19993
$ : num -19993
$ : num -9e+06
Whereas, what I need is the X as mentioned above
> X
[[1]]
9000000.00 -9000000.00 0.00 19993.04 -19993.04 -19993.04 -9000000.00
where
str(X)
List of 1
$ : num [1:7] 9e+06 -9e+06 0e+00 2e+04 -2e+04 ...
Thank you for your suggestions!

We still do not really know what you want or tried, but here is a minimal existence proof for you:
R> cppFunction('List mylist(IntegerVector x) { return List(x); }')
R> mylist(c(2:4))
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
R>
In goes a vector, out comes a list. Have a look at the Rcpp examples and eg the Rcpp Gallery site.

Turns out I was creating the List in a wrong way, I had to remove the following
List odes(1); and List(odes_vec); and return (odes); and had to add the following statement at the end
return Rcpp::List::create(odes_vec);
A better explanation can be found here

Related

Implementing the Bartels–Stewart algorithm in Eigen3 -- real matrices only?

Based off this question and solution -- Implementing the Bartels–Stewart algorithm in Eigen3? -- I am trying to solve Lyapunov equations (AX + XA^T = C) using the Eigen library, but am limited to real matrices.
The R (with c++) code below works, but involves complex numbers. It can definitely be simplified (since in this framing, there is no B matrix), but the main difficulty is the reliance on complex numbers. The real schur form seems to be the standard alternative in this case, but the Eigen function matrix_function_solve_triangular_sylvester then does not work because the input matrix is not upper triangular, but is upper block triangular. I would be happy to see suggestions to a) remove the need for complex numbers, and then if that is possible, b) any efficiency improvements.
library(expm)
library(Rcpp)
library(RcppEigen)
library(inline)
# R -----------------------------------------------------------------------
d<-6 #dimensions
A<-matrix(rnorm(d^2),d,d) #continuous time transition
G <- matrix(rnorm(d^2),d,d)
C<-G %*% t(G) #continuous time pos def error
AHATCH<-A %x% diag(d) + diag(d) %x% A
Xtrue<-matrix(-solve(AHATCH,c(C)), d) #asymptotic error from continuous time
# c++ in R ---------------------------------------------------------------------
sylcpp <- '
using Eigen::Map;
using Eigen::MatrixXd;
// Map the double matrix A from Ar
const Map<MatrixXd> A(as<Map<MatrixXd> >(Ar));
// Map the double matrix Q from Qr
const Map<MatrixXd> Q(as<Map<MatrixXd> >(Qr));
Eigen::MatrixXd B = A.transpose();
Eigen::ComplexSchur<Eigen::MatrixXd> SchurA(A);
Eigen::MatrixXcd R = SchurA.matrixT();
Eigen::MatrixXcd U = SchurA.matrixU();
Eigen::ComplexSchur<Eigen::MatrixXd> SchurB(B);
Eigen::MatrixXcd S = SchurB.matrixT();
Eigen::MatrixXcd V = SchurB.matrixU();
Eigen::MatrixXcd F = (U.adjoint() * Q) * V;
Eigen::MatrixXcd Y = Eigen::internal::matrix_function_solve_triangular_sylvester(R, S, F);
Eigen::MatrixXd X = ((U * Y) * V.adjoint()).real();
return wrap(X);
'
syl <- cxxfunction(signature(Ar = "matrix",Qr='matrix'), sylcpp, plugin = "RcppEigen")
X=syl(A,-C)
X-Xtrue #approx zero
In principle, you could use RealSchur insted.
That will produce a quasi-triangular real R.

Rcpp fast statistical mode function with vector input of any type

I'm trying to build a super fast mode function for R to use for aggregating large categorical datasets. The function should take vector input of all supported R types and return the mode. I have read This post, This Help-page and others, but I was not able to make the function take in all R data types. My code now works for numeric vectors, I am relying on Rcpp sugar wrapper functions:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int Mode(NumericVector x, bool narm = false)
{
if (narm) x = x[!is_na(x)];
NumericVector ux = unique(x);
int y = ux[which_max(table(match(x, ux)))];
return y;
}
In addition I was wondering if the 'narm' argument can be renamed 'na.rm' without giving errors, and of course if there is a faster way to code a mode function in C++, I would be grateful to know about it.
In order to make the function work for any vector input, you could implement #JosephWood's algorithm for any data type you want to support and call it from a switch(TYPEOF(x)). But that would be lots of code duplication. Instead, it is better to make a generic function that can work on any Vector<RTYPE> argument. If we follow R's paradigm that everything is a vector and let the function also return a Vector<RTYPE>, then we can make use of RCPP_RETURN_VECTOR. Note that we need C++11 to be able to pass additional arguments to the function called by RCPP_RETURN_VECTOR. One tricky thing is that you need the storage type for Vector<RTYPE> in order to create a suitable std::unordered_map. Here Rcpp::traits::storage_type<RTYPE>::type comes to the rescue. However, std::unordered_map does not know how to deal with complex numbers from R. For simplicity, I am disabling this special case.
Putting it all together:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::plugins(cpp11)]]
#include <unordered_map>
template <int RTYPE>
Vector<RTYPE> fastModeImpl(Vector<RTYPE> x, bool narm){
if (narm) x = x[!is_na(x)];
int myMax = 1;
Vector<RTYPE> myMode(1);
// special case for factors == INTSXP with "class" and "levels" attribute
if (x.hasAttribute("levels")){
myMode.attr("class") = x.attr("class");
myMode.attr("levels") = x.attr("levels");
}
std::unordered_map<typename Rcpp::traits::storage_type<RTYPE>::type, int> modeMap;
modeMap.reserve(x.size());
for (std::size_t i = 0, len = x.size(); i < len; ++i) {
auto it = modeMap.find(x[i]);
if (it != modeMap.end()) {
++(it->second);
if (it->second > myMax) {
myMax = it->second;
myMode[0] = x[i];
}
} else {
modeMap.insert({x[i], 1});
}
}
return myMode;
}
template <>
Vector<CPLXSXP> fastModeImpl(Vector<CPLXSXP> x, bool narm) {
stop("Not supported SEXP type!");
}
// [[Rcpp::export]]
SEXP fastMode( SEXP x, bool narm = false ){
RCPP_RETURN_VECTOR(fastModeImpl, x, narm);
}
/*** R
set.seed(1234)
s <- sample(1e5, replace = TRUE)
fastMode(s)
fastMode(s + 0.1)
l <- sample(c(TRUE, FALSE), 11, replace = TRUE)
fastMode(l)
c <- sample(letters, 1e5, replace = TRUE)
fastMode(c)
f <- as.factor(c)
fastMode(f)
*/
Output:
> set.seed(1234)
> s <- sample(1e5, replace = TRUE)
> fastMode(s)
[1] 85433
> fastMode(s + 0.1)
[1] 85433.1
> l <- sample(c(TRUE, FALSE), 11, replace = TRUE)
> fastMode(l)
[1] TRUE
> c <- sample(letters, 1e5, replace = TRUE)
> fastMode(c)
[1] "z"
> f <- as.factor(c)
> fastMode(f)
[1] z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
As noted above, the used algorithm comes from Joseph Wood's answer, which has been explicitly dual-licensed under CC-BY-SA and GPL >= 2. I am following Joseph and hereby license the code in this answer under the GPL (version 2 or later) in addition to the implicit CC-BY-SA license.
In your Mode function, since you are mostly calling sugar wrapper functions, you won't see that much improvement over base R. In fact, simply writing a faithful base R translation, we have:
baseMode <- function(x, narm = FALSE) {
if (narm) x <- x[!is.na(x)]
ux <- unique(x)
ux[which.max(table(match(x, ux)))]
}
And benchmarking, we have:
set.seed(1234)
s <- sample(1e5, replace = TRUE)
library(microbenchmark)
microbenchmark(Mode(s), baseMode(s), times = 10, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Mode(s) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
baseMode(s) 1.490765 1.645367 1.571132 1.616061 1.637181 1.448306 10
Typically, when we undertake the effort of writing our own compiled code, we would expect bigger gains. Simply wrapping these already efficient compiled functions in Rcpp isn't going to magically get you the gains you expect. In fact, on larger examples the base solution is faster. Observe:
set.seed(1234)
sBig <- sample(1e6, replace = TRUE)
system.time(Mode(sBig))
user system elapsed
1.410 0.036 1.450
system.time(baseMode(sBig))
user system elapsed
0.915 0.025 0.943
To address your question of writing a faster mode function, we can make use of std::unordered_map, which is very similar to table underneath the hood (i.e. they are both hash tables at their heart). Additionally, since you are returning a single integer, we can safely assume that we can replace NumericVector with IntegerVector and also that you are not concerned with returning every value that occurs the most.
The algorithm below can be modified to return the true mode, but I will leave that as an exercise (hint: you will need std::vector along with taking some sort of action when it->second == myMax). N.B. you will also need to add // [[Rcpp::plugins(cpp11)]] at the top of your cpp file for std::unordered_map and auto.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
#include <unordered_map>
// [[Rcpp::export]]
int fastIntMode(IntegerVector x, bool narm = false) {
if (narm) x = x[!is_na(x)];
int myMax = 1;
int myMode = 0;
std::unordered_map<int, int> modeMap;
modeMap.reserve(x.size());
for (std::size_t i = 0, len = x.size(); i < len; ++i) {
auto it = modeMap.find(x[i]);
if (it != modeMap.end()) {
++(it->second);
if (it->second > myMax) {
myMax = it->second;
myMode = x[i];
}
} else {
modeMap.insert({x[i], 1});
}
}
return myMode;
}
And the benchmarks:
microbenchmark(Mode(s), baseMode(s), fastIntMode(s), times = 15, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
Mode(s) 6.428343 6.268131 6.622914 6.134388 6.881746 7.78522 15
baseMode(s) 9.757491 9.404101 9.454857 9.169315 9.018938 10.16640 15
fastIntMode(s) 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 15
Now we are talking... about 6x faster than the original and 9x faster than base. They all return the same value:
fastIntMode(s)
##[1] 85433
baseMode(s)
##[1] 85433
Mode(s)
##[1] 85433
And for our larger example:
## base R returned in 0.943s
system.time(fastIntMode(s))
user system elapsed
0.217 0.006 0.224
In addition to the implicit CC-BY-SA license I hereby license the code in this answer under the GPL >= 2.
To follow up with some shameless self-promotion, I have now published a package collapse on CRAN which includes a full set of Fast Statistical Functions, amonst them the generic function fmode. The implementation is based on index hashing and even faster than the solution above. fmode can be used to perform simple, grouped and/or weighted mode calculations on vectors, matrices, data.frames and dplyr grouped tibbles. Syntax:
fmode(x, g = NULL, w = NULL, ...)
where x is a vector, matrix, data.frame or grouped_df, g is a grouping vector or list of grouping vectors, and w is a vector of weights. A compact solution to categorical and mixed aggregation problems is further provided by the function collap. The code
collap(data, ~ id1 + id2, FUN = fmean, catFUN = fmode)
aggregates the mixed type data.frame data applying fmean to numeric and fmode to categorical columns. More customized calls are also possible. Together with the Fast Statistical Functions, collap is just as fast as data.table on large numeric data, and categorical and weighted aggregations are significantly faster than anything that can presently be done with data.table.

Detecting and omitting na values from a std vector in Rcpp

I have a std::vector; whose elements need to be summed up after checking if there is any Na values (and obviously removing the Na values if it has any) in it. I have to do it in Rcpp. Now, for a numeric vector in Rcpp (NumericVector); it is very easy as the code says below:
cppFunction("
double res ( NumericVector x){
NumericVector v = x[! is_na(x)];
return sum(v);
}
")
. So for a vector "x", it easily gives the sum as follows:
x<- c(NaN,1,2)
res(x)
[1] 3
Now for a std::vector x; how can I do the same?
You should be able to use RcppHoney (also on CRAN here) which brings the vectorised idioms of Rcpp Sugar (which has vectorised NA tests just like R has) to any iterable container -- hence also STL ones.
See eg the into vignette for this example of combining different vector types and classes into a single scalar exppression:
// [[Rcpp::export]]
Rcpp::NumericVector example_manually_hooked() {
// We manually hooked std::list in to RcppHoney so we'll create one
std::list< int > l;
l.push_back(1); l.push_back(2); l.push_back(3); l.push_back(4); l.push_back(5);
// std::vector is already hooked in to RcppHoney in default_hooks.hpp so
// we'll create one of those too
std::vector< int > v(l.begin(), l.end());
// And for good measure, let's create an Rcpp::NumericVector which is
// also hooked by default
Rcpp::NumericVector v2(v.begin(), v.end());
// Now do some weird operations incorporating std::vector, std::list,
// Rcpp::NumericVector and some RcppHoney functions and return it. The
// return value will be equal to the following R snippet:
// v <- 1:5
// result <- 42 + v + v + log(v) - v - v + sqrt(v) + -v + 42
// We can store our result in any of RcppHoney::LogicalVector,
// RcppHoney::IntegerVector, or RcppHoney::NumericVector and simply return
// it to R. These classes inherit from their Rcpp counterparts and add a
// new constructor. The only copy of the data, in this case, is when we
// assign our expression to retval. Since it is then a "native" R type,
// returning it is a shallow copy. Alternatively we could write this as:
// return Rcpp::wrap(1 + v + RcppHoney::log(v) - v - 1
// + RcppHoney::sqrt(v) + -v2);
RcppHoney::NumericVector retval
= 42 + l + v + RcppHoney::log(v) - v - l + RcppHoney::sqrt(v) + -v2
+ 42;
return retval;
}

Error when calling C++ function from R and integrate it

I want to numerically integrate a 1-dimensional function (that was written in C++) with the R function integrate. As a short example, I coded the function myfunc in C++.
#include <cmath>
#include <Rcpp.h>
using namespace std;
// [[Rcpp::export]]
double myfunc (double x){
double result;
result = exp( -0.5*pow(x,2) + 2*x );
return result;
}
After loading myfunc in R and integrating it, I obtain the following error:
library(Rcpp)
sourceCpp("myfunc.cpp")
integrate(myfunc,lower=0,upper=10)
Error in f(x, ...) : Expecting a single value: [extent=21].
Can anyone explain what this error means and how I can solve this problem?
From help("integrate"):
f must accept a vector of inputs and produce a vector of function evaluations at those points. The Vectorize function may be helpful to convert f to this form.
You have created your function to accept a single value, a double, so when integrate() tries to pass it a vector, it rightfully complains. So, try
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector myfunc(Rcpp::NumericVector x){
return exp(-0.5 * pow(x, 2) + 2 * x);
}
/*** R
integrate(myfunc, lower = 0, upper = 10)
*/
Resulting in
integrate(myfunc, lower = 0, upper = 10)
# 18.10025 with absolute error < 5.1e-08
Or, using myfunc() compiled from your C++ code from above,
f <- Vectorize(myfunc)
integrate(f, lower = 0, upper = 10)
# 18.10025 with absolute error < 5.1e-08

Trying to write a setdiff() function using RcppArmadillo gives compilation error

I'm trying to write a sort of analogue of R's setdiff() function in C++ using RcppArmadillo. My rather crude approach:
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec x, arma::uvec y){
// Coefficientes of unsigned integer vector y form a subset of the coefficients of unsigned integer vector x.
// Returns set difference between the coefficients of x and those of y
int n2 = y.n_elem;
uword q1;
for (int j=0 ; j<n2 ; j++){
q1 = find(x==y[j]);
x.shed_row(q1);
}
return x;
}
fails at compilation time. The error reads:
fnsauxarma.cpp:622:29: error: no matching function for call to ‘arma::Col<double>::shed_row(const arma::mtOp<unsigned int, arma::mtOp<unsigned int, arma::Col<double>, arma::op_rel_eq>, arma::op_find>)’
I really have no idea what's going on, any help or comments would be greatly appreciated.
The problem is that arma::find returns a uvec, and doesn't know how to make the implicit conversion to arma::uword, as pointed out by #mtall. You can help the compiler out by using the templated arma::conv_to<T>::from() function. Also, I included another version of my_setdiff that returns an Rcpp::NumericVector because although the first version returns the correct values, it's technically a matrix (i.e. it has dimensions), and I assume you would want this to be as compatible with R's setdiff as possible. This is accomplished by setting the dim attribute of the return vector to NULL, using R_NilValue and the Rcpp::attr member function.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector my_setdiff2(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:8
y <- 2:6
R> all.equal(setdiff(x,y), my_setdiff(x,y))
#[1] "Attributes: < target is NULL, current is list >" "target is numeric, current is matrix"
R> all.equal(setdiff(x,y), my_setdiff2(x,y))
#[1] TRUE
R> setdiff(x,y)
#[1] 1 7 8
R> my_setdiff(x,y)
# [,1]
# [1,] 1
# [2,] 7
# [3,] 8
R> my_setdiff2(x,y)
#[1] 1 7 8
*/
Edit:
For the sake of completeness, here is a more robust version of setdiff than the two implementations presented above:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::NumericVector arma_setdiff(arma::uvec& x, arma::uvec& y){
x = arma::unique(x);
y = arma::unique(y);
for (size_t j = 0; j < y.n_elem; j++) {
arma::uvec q1 = arma::find(x == y[j]);
if (!q1.empty()) {
x.shed_row(q1(0));
}
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:10
y <- 2:8
R> all.equal(setdiff(x,y), arma_setdiff(x,y))
#[1] TRUE
X <- 1:6
Y <- c(2,2,3)
R> all.equal(setdiff(X,Y), arma_setdiff(X,Y))
#[1] TRUE
*/
The previous versions would throw an error if you passed them vectors with non-unique elements, e.g.
R> my_setdiff2(X,Y)
error: conv_to(): given object doesn't have exactly one element
To solve the problem and more closely mirror R's setdiff, we just make x and y unique. Additionally, I switched out the arma::conv_to<>::from with q1(0) (where q1 is now a uvec instead of a uword), because uvec's are just a vector of uwords, and the explicit cast seemed a little inelegant.
I've used std::set_difference from the STL instead, converting back and forth from arma::uvec.
#include <RcppArmadillo.h>
#include <algorithm>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec std_setdiff(arma::uvec& x, arma::uvec& y) {
std::vector<int> a = arma::conv_to< std::vector<int> >::from(arma::sort(x));
std::vector<int> b = arma::conv_to< std::vector<int> >::from(arma::sort(y));
std::vector<int> out;
std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
std::inserter(out, out.end()));
return arma::conv_to<arma::uvec>::from(out);
}
Edit: I thought a performance comparison might be in order. The difference becomes smaller when the relative sizes of the sets are in the opposite order.
a <- sample.int(350)
b <- sample.int(150)
microbenchmark::microbenchmark(std_setdiff(a, b), arma_setdiff(a, b))
> Unit: microseconds
> expr min lq mean median uq max neval cld
> std_setdiff(a, b) 11.548 14.7545 17.29930 17.107 19.245 36.779 100 a
> arma_setdiff(a, b) 60.727 65.0040 71.77804 66.714 72.702 138.133 100 b
The Questioner might have already got the answer. However, the following template version may be more general. This is equivalent to setdiff function in Matlab
If P and Q are two sets, then their difference is given by P - Q or Q - P. If P = {1, 2, 3, 4} and Q = {4, 5, 6}, P - Q means elements of P which are not in Q. i.e., in the above example P - Q = {1, 2, 3}.
/* setdiff(t1, t2) is similar to setdiff() function in MATLAB. It removes the common elements and
gives the uncommon elements in the vectors t1 and t2. */
template <typename T>
T setdiff(T t1, T t2)
{
int size_of_t1 = size(t1);
int size_of_t2 = size(t2);
T Intersection_Elements;
uvec iA, iB;
intersect(Intersection_Elements, iA, iB, t1, t2);
for (int i = 0; i < size(iA); i++)
{
t1(iA(i)) = 0;
}
for (int i = 0; i < size(iB); i++)
{
t2(iB(i)) = 0;
}
T t1_t2_vec(size_of_t1 + size_of_t2);
t1_t2_vec = join_vert(t1, t2);
T DiffVec = nonzeros(t1_t2_vec);
return DiffVec;
}
Any suggestions for improving the performance of the algorithm are welcome.