Understanding strange behavior in Rcpp

Understanding strange behavior in Rcpp - c++

I have run into something I cannot wrap my head around. It's part of a larger coding effort, but a minimal example is here:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(arma::vec & tau2, const arma::vec & nu) {
arma::vec bet = Rcpp::rnorm(3);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2)/nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
(tau2, although a scalar, is a vector here because I want to pass by reference: function pass by reference in RcppArmadillo)
What is puzzling me is that if I now run the following R code:
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
I get:
error: element-wise division: incompatible matrix dimensions: 3x1 and 18x1
Error in foo(tau2, nu) :
element-wise division: incompatible matrix dimensions: 3x1 and 18x1
where the 18x1 varies; mostly it's 0x1 but it's always a multiple of 3.
Looking at the output:
> nu
[,1] [,2] [,3] [,4]
[1,] 4.165242 4.165242 4.165242 4.165242
[2,] 4.165242 4.165242 4.165242 4.165242
[3,] 4.165242 4.165242 4.165242 4.165242
> upd
$nu
[,1]
[1,] 1
[2,] 1
[3,] 1
$tau2
[,1]
[1,] 4.165242
That is, despite declaring nu as a constant reference (which I do because I do not want it changed), it is altered. The value it is filled with is upd$tau2 (but why?).
Strangely, I can make the behavior go away by seemingly meaningless changes by:
putting tau2 <- 1.0 or nu <- matrix(1, n, 1) (or both) outside of the loop
removing the reference in the first argument (i.e. using arma::vec tau2)
not dividing pow(bet, 2) by nu
changing to nu <- rep(1, n)
Perhaps the most confusing part is that if I select the code chunk inside of the loop and repeatedly run it, it works(!). However, if I run the R code using the loop it crashes on the second iteration.
Because I seem to be able to fix the problem, I'm mostly interested in learning what is going on here. I suspect it's just a consequence of my lack of expertise in C++ and recklessness with various variable types, so knowing what is causing all of this would be very valuable.

Two fixes:
tau2 is a double (mimic #dirkeddelbuettel here)
Temporary variable for the NumericVector of length n being generated prior to saving into bet
Code:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(double tau2, const arma::vec & nu) {
int n = nu.n_elem;
Rcpp::NumericVector x = Rcpp::rnorm(n);
arma::vec bet = arma::vec(x.begin(), n, true, false);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2) / nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
Test case:
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
upd
#> $nu
#> [,1]
#> [1,] 1
#> [2,] 1
#> [3,] 1
#>
#> $tau2
#> [1] 3.292889

If I change the interface to using a double it all works:
Code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(double & tau2, const arma::vec & nu) {
arma::vec bet = Rcpp::rnorm(3);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2)/nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
/*** R
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
*/
Demo
R> sourceCpp("/tmp/hejseb.cpp")
R> n <- 3
R> m <- matrix(0, n, 1)
R> for (r in 1:1000) {
+ tau2 <- 1.0
+ nu <- matrix(1, n, 1)
+ upd <- foo(tau2, nu)
+ }
R> upd
$nu
[,1]
[1,] 1
[2,] 1
[3,] 1
$tau2
[1] 1.77314
R>
I am not sure if those are the numbers you expected. I don't really have time to work through what you are trying to do.

Related

Rcpp select/subset NumericMatrix column by a NumericVector

I can select all the rows of a matrix and a range of columns of a matrix as follows:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x){
return x(_, Range(0, 1));
}
')
However, I would like to select columns based on a NumericVector y which, for instance, could be something like c(0, 1, 0, 0, 1). I tried this:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x, NumericVector y){
return x(_, y);
}
')
but it doesn't compile. How do I do it?

Alas, Rcpp doesn't have great support for non-contiguous views or selecting in a single statement only columns 1 and 4. As you saw, selecting contiguous views or selecting all columns is accessible with Rcpp::Range(). You'll likely want to upgrade to RcppArmadillo for better control over matrix subsets.
RcppArmadillo subset examples
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat matrix_subset_idx(const arma::mat& x,
const arma::uvec& y) {
// y must be an integer between 0 and columns - 1
// Allows for repeated draws from same columns.
return x.cols( y );
}
// [[Rcpp::export]]
arma::mat matrix_subset_logical(const arma::mat& x,
const arma::vec& y) {
// Assumes that y is 0/1 coded.
// find() retrieves the integer index when y is equivalent 1.
return x.cols( arma::find(y == 1) );
}
Test
# Sample data
x = matrix(1:15, ncol = 5)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 7 10 13
# [2,] 2 5 8 11 14
# [3,] 3 6 9 12 15
# Subset only when 1 (TRUE) is found:
matrix_subset_logical(x, c(0, 1, 0, 0, 1))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
# Subset with an index representing the location
# Note: C++ indices start at 0 not 1!
matrix_subset_idx(x, c(1, 3))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
Pure Rcpp logic
If you do not want to take on the dependency of armadillo, then the equivalent for the matrix subset in Rcpp is:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_subset_idx_rcpp(
Rcpp::NumericMatrix x, Rcpp::IntegerVector y) {
// Determine the number of observations
int n_cols_out = y.size();
// Create an output matrix
Rcpp::NumericMatrix out = Rcpp::no_init(x.nrow(), n_cols_out);
// Loop through each column and copy the data.
for(unsigned int z = 0; z < n_cols_out; ++z) {
out(Rcpp::_, z) = x(Rcpp::_, y[z]);
}
return out;
}

Eigen giving different results for inplace versus non-inplace versions of function

I am having a weird problem where two functions that should give identical results are disagreeing. I have included the code below. I know that the results of test1 are correct while test2 are wrong.
#include <RcppEigen.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
Eigen::MatrixXd test1(Eigen::MatrixXd A){
int p = A.rows();
return A.triangularView<Eigen::Lower>().solve(Eigen::MatrixXd::Identity(p,p)).transpose();
}
// [[Rcpp::export]]
Eigen::MatrixXd test2(Eigen::MatrixXd A){
int p = A.rows();
Eigen::MatrixXd I = Eigen::MatrixXd::Identity(p,p);
A.triangularView<Eigen::Lower>().solveInPlace(I);
A.transposeInPlace();
return A;
}
/*** R
A <- rWishart(1, 10, diag(4))[,,1]
A <- t(chol(A))
test1(A)
test2(A)
*/
Here is the output
> test1(A)
[,1] [,2] [,3] [,4]
[1,] 0.2251857 -0.01455544 -0.20205410 -0.08993337
[2,] 0.0000000 0.32498583 -0.06486972 -0.14006616
[3,] 0.0000000 0.00000000 0.60379357 0.27294390
[4,] 0.0000000 0.00000000 0.00000000 0.37409978
> test2(A)
[,1] [,2] [,3] [,4]
[1,] 4.440779 0.1988932 1.5074352 0.04220045
[2,] 0.000000 3.0770572 0.3305895 0.91087781
[3,] 0.000000 0.0000000 1.6561952 -1.20836313
[4,] 0.000000 0.0000000 0.0000000 2.67308367
My question is how do I write an inplace version of test1 that is not incorrect? Also why is test2 incorrect?

The line:
A.triangularView<Eigen::Lower>().solveInPlace(I);
modifies I not A. So you need to ends test2 with:
I.transposeInPlace();
return I;

error when passing `arma::cube`argument to function using RcppArmadillo

I am getting the following error when trying to compile using sourceCpp from Rcpppackage:
`my path to R/.../Rcpp/internal/Exporter.h`
no matching function for call to 'arma::Cube<double>::Cube(SEXPREC*&)'
The object cube is the armadillo equivalent of an array in R.
EDIT: Note that the problem seems to be that the function can't accept a arma::cube object as an argument. If we change arma::cube Bby arma::mat Bit does work:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
// [[Rcpp::export]]
arma::cube ssmooth(arma::mat A,
arma::cube B) {
int ns = A.n_rows;
int nk = A.n_cols;
int np = B.n_rows;
arma::mat C = zeros<mat>(nk, ns);
arma::cube D = zeros<cube>(nk, nk, ns);
return D;
}
I would appreciate any hint.

A basic example works:
R> cppFunction("arma::cube getCube(int n) { arma::cube a(n,n,n);\
a.zeros(); return a; }", depends="RcppArmadillo")
R> getCube(2)
, , 1
[,1] [,2]
[1,] 0 0
[2,] 0 0
, , 2
[,1] [,2]
[1,] 0 0
[2,] 0 0
R>
so either you are doing something wrong or your installation is off.

I had the same issue. The problem seems to be related to the combination "Rcpp::export" and cube as an argument of the exported function. My guess is that the converter from sexp to cube may not be implemented yet (no pun intended ;-)). (Or we are both missing something...).
Workaround when you want to have an arma::cube argument in a Rcpp::export function: get it first as a NumericVector and simply create the cube afterward...
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
// [[Rcpp::export]]
arma::cube ssmooth(arma::mat A,
NumericVector B_) {
IntegerVector dimB=B_.attr("dim");
arma::cube B(B_.begin(), dimB[0], dimB[1], dimB[2]);
//(rest of your code unchanged...)
int ns = A.n_rows;
int nk = A.n_cols;
int np = B.n_rows;
arma::mat C = zeros<mat>(nk, ns);
arma::cube D = zeros<cube>(nk, nk, ns);
return D;
}

I think your code fails because implicitly it tries to do casting like this:
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
cube return_thing(SEXP thing1){
cube thing2 = as<cube>(thing1);
return thing2;
}
/***R
thing <- 1:8
dim(thing) <- c(2, 2, 2)
return_thing(thing)
*/
which doesn't work, whereas it works for matrices:
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
//[[Rcpp::export]]
mat return_thing(SEXP thing1){
mat thing2 = as<mat>(thing1);
return thing2;
}
/***R
thing <- 1:4
dim(thing) <- c(2, 2)
return_thing(thing)
*/

I am able to read and return an arma cube with the following function :
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::cube return_cube(arma::cube X)
{
return(X);
}
For example, I obtain the following result when I run the following in R :
my_cube <- array(data = rnorm(5 * 3 * 2), dim = c(5,3, 2))
return_cube(my_cube)
, , 1
[,1] [,2] [,3]
[1,] 0.4815994 1.0863765 0.3278728
[2,] 1.4138699 -0.7809922 0.8341867
[3,] 0.6555752 -0.2708001 0.7701501
[4,] 1.1447104 -1.4064894 -0.2653888
[5,] 1.5972670 1.8368235 -2.2814959
, , 2
[,1] [,2] [,3]
[1,] -0.485091067 1.1826162 -0.3524851
[2,] 0.227652584 0.3005968 -0.6079604
[3,] -0.147653664 1.3463318 -1.2238623
[4,] 0.067090580 -0.8982740 -0.8903684
[5,] 0.006421618 -1.7156955 -1.2813880

Insertion of column in armadillo gives different results from insertion in R before passing in RcppArmadillo, what have I done wrong?

Here is the cpp file:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
List mylm1(NumericVector yr, NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, true);
Rcout << X << std::endl;
arma::mat allOne(n, 1, arma::fill::ones);
X.insert_cols(0, allOne);
Rcout << X << std::endl;
arma::colvec y(yr.begin(), yr.size(), true);
arma::colvec coef = arma::solve(X, y);
arma::colvec resid = y - X*coef;
double sig2 = arma::as_scalar( arma::trans(resid)*resid/(n-k) );
arma::colvec stderrest = arma::sqrt( sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
arma::colvec tval = coef / stderrest;
return Rcpp::List::create(
Rcpp::Named("coefficients") = coef,
Rcpp::Named("stderr") = stderrest,
Rcpp::Named("tval") = tval
) ;
}
// [[Rcpp::export]]
List mylm2(NumericVector yr, NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, false);
Rcout << X << std::endl;
arma::colvec y(yr.begin(), yr.size(), false);
arma::colvec coef = arma::solve(X, y);
arma::colvec resid = y - X*coef;
double sig2 = arma::as_scalar( arma::trans(resid)*resid/(n-k) );
arma::colvec stderrest = arma::sqrt( sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
return Rcpp::List::create(
Rcpp::Named("coefficients") = coef,
Rcpp::Named("stderr") = stderrest
) ;
}
Then in R:
sourceCpp('b.cpp')
set.seed(1)
x = matrix(rnorm(100), 25, 4)
y = rnorm(25)
mylm1(y, x)
mylm2(y, cbind(1, x))
summary(lm(y~x))
mylm1(y, x)
# 1.0000 -0.6265 -0.0561 0.3981 0.2914
# 1.0000 0.1836 -0.1558 -0.6120 -0.4433
# ...
# 1.0000 -1.9894 -0.1123 -0.9341 -1.2246
# 1.0000 0.6198 0.8811 -1.2536 -0.4734
#
#
# $stderr
# [,1]
# [1,] 0.16765
# [2,] 0.18009
# [3,] 0.24104
# [4,] 0.15117
# [5,] 0.21064
mylm2(y, cbind(1, x))
# 1.0000 -0.6265 -0.0561 0.3981 0.2914
# 1.0000 0.1836 -0.1558 -0.6120 -0.4433
# ...
# 1.0000 -1.9894 -0.1123 -0.9341 -1.2246
# 1.0000 0.6198 0.8811 -1.2536 -0.4734
#
# $stderr
# [,1]
# [1,] 0.17179
# [2,] 0.18454
# [3,] 0.24699
# [4,] 0.15490
# [5,] 0.21584
summary(lm(y~x))
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.1122 0.1718 0.65 0.52
# x1 0.0499 0.1845 0.27 0.79
# x2 0.1076 0.2470 0.44 0.67
# x3 -0.0435 0.1549 -0.28 0.78
# x4 -0.1750 0.2158 -0.81 0.43
According to the output of Rcout, the two methods generates the same X matrix in the end, but they don't give the same stand errors, why?

Your formula for the stdandard errors is wrong. You forgot to adjust k for the added column. The fact that the matrices actually are the same should have made you a little suspicious about the rest of the code.
And I already mentioned to you in the comments to your last question that you need to treat R2 differently for the intercept and 'no intercept' cases. It is the same thing here with the standard errors, and the root cause of your problem here. In short, no problem with the matrix operations, but a problem when you extract and transform data later.
Take the hint and read the RcppArmadillo sources -- the combination of the fastLm.cpp code and the corresponding R functions is really short. Or else read up on the finer details of how to compute linear least squares and assorted analytics.
Oh, and I also have the same objection to your question title as before. RcppArmadillo is once again not at fault; your usage however is.

Is there a c++ matrix library where I can index matrices with non-contiguous vectors as in R?

I believe boost has a limitation on contiguous or at least step-wise consistent slicing of matrices. In R, I could have a random vector c(5,2,8) and use that to index into a matrix M[c(5,2,8),] for example...

Armadillo supports this as of version 3.0 which was released not even two weeks ago.
Here is a worked example via RcppArmadillo:
R> library(inline)
R>
R> code <- '
+ arma::mat M = Rcpp::as<arma::mat>(m); // normal matrix
+ arma::uvec V = Rcpp::as<arma::uvec>(v); // unsigned int vec
+ arma::mat N = M.cols(V); // index matrix by vec
+ return Rcpp::wrap(N);
+ '
R>
R> fun <- cxxfunction(signature(m="numeric", v="integer"),
+ code,
+ plugin="RcppArmadillo")
R> M <- matrix(1:25,5,5)
R> V <- c(1L, 3L, 5L) - 1 # offset by one for zero indexing
R> fun(M, V)
[,1] [,2] [,3]
[1,] 1 11 21
[2,] 2 12 22
[3,] 3 13 23
[4,] 4 14 24
[5,] 5 15 25
R>
There is a matching function to pick rows rather than columns.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Understanding strange behavior in Rcpp - c++

Related

Rcpp select/subset NumericMatrix column by a NumericVector

Eigen giving different results for inplace versus non-inplace versions of function

error when passing `arma::cube`argument to function using RcppArmadillo

Insertion of column in armadillo gives different results from insertion in R before passing in RcppArmadillo, what have I done wrong?

Is there a c++ matrix library where I can index matrices with non-contiguous vectors as in R?

Categories

Resources