Column-wise ordering of matrix - c++

I am new to RcppArmadillo. I am wondering how I can make a column-wise ordered matrix by the index of given vector. I know how to do it in R, but in RcppArmadillo it does not working. For example, in R,
aa = c(2,4,1,3)
# [1] 2 4 1 3
bb = cbind(c(1,5,4,2),c(3,1,0,8))
# [,1] [,2]
# [1,] 1 3
# [2,] 5 1
# [3,] 4 0
# [4,] 2 8
Trying to subset with R gives:
cc = bb[aa,]
# [,1] [,2]
# [1,] 5 1
# [2,] 2 8
# [3,] 1 3
# [4,] 4 0
I've tried the following using RcppArmadillo:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List example(arma::vec aa,arma::mat bb){
int p = bb.n_rows;
int n = aa.size();
arma::uvec index_aa=sort_index(aa);;
List cc(n);
for(int it=0; it<p; it++){
cc(it) = bb.each_col();
}
return List::create(cc);
}
and,
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List example(arma::vec aa,arma::mat bb){
arma::uvec index_aa=sort_index(aa);
return List::create(bb.elem(index_aa));
}

Not sure why you are sorting the index here as that causes a new order to be introduced compared to bb[aa,].
Anyway, the idea here is to subset using the .rows() index, which requires a uvec or unsigned integer vector. As aa contains R indexes, we can translate them from R to C++ by subtracting 1 to take it from a 1-based index system to a 0-based index system.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat example_subset(arma::uvec aa, arma::mat bb){
// Convert to a C++ index from R (1 to 0-based indices)
aa = aa - 1;
return bb.rows(aa);
}
Test code:
aa = c(2, 4, 1, 3)
bb = cbind(c(1, 5, 4, 2), c(3, 1, 0, 8))
cpp_cc = example_subset(aa, bb)
r_cc = cbind(c(5,2,1,4),c(1,8,3,0))
all.equal(cpp_cc, r_cc)
# [1] TRUE

Related

Submatrix summation doesn't work for NumericMatrix Rcpp

I observed the following weird situation when defining an Rcpp function.
The first function works perfectly, and calculates the sum of submatrix s
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double Sub(NumericMatrix m){
NumericMatrix s = m(Range(0,0),Range(0,1));
return sum(s);
}
However, when I alternate the code in the following way
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double Sub(NumericMatrix m){
return sum(m(Range(0,0),Range(0,1)));
}
It doesn't work which is really counterintuitive. The error that I get is the following no matching function for call to 'sum(Rcpp::Matrix<14>::Sub)'
I just skip a step, where I define the submatrix NumericMatrix s.
A matrix that can be used to check that is the following
n = matrix(c(6090,16,0,0,618,1036,3,0,99,0,312,4,25,0,0,3,0,0,0,0,1794,0,0,0,0),5,5,byrow=TRUE)
n
[,1] [,2] [,3] [,4] [,5]
[1,] 6090 16 0 0 618
[2,] 1036 3 0 99 0
[3,] 312 4 25 0 0
[4,] 3 0 0 0 0
[5,] 1794 0 0 0 0

Rcpp select/subset NumericMatrix column by a NumericVector

I can select all the rows of a matrix and a range of columns of a matrix as follows:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x){
return x(_, Range(0, 1));
}
')
However, I would like to select columns based on a NumericVector y which, for instance, could be something like c(0, 1, 0, 0, 1). I tried this:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x, NumericVector y){
return x(_, y);
}
')
but it doesn't compile. How do I do it?
Alas, Rcpp doesn't have great support for non-contiguous views or selecting in a single statement only columns 1 and 4. As you saw, selecting contiguous views or selecting all columns is accessible with Rcpp::Range(). You'll likely want to upgrade to RcppArmadillo for better control over matrix subsets.
RcppArmadillo subset examples
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat matrix_subset_idx(const arma::mat& x,
const arma::uvec& y) {
// y must be an integer between 0 and columns - 1
// Allows for repeated draws from same columns.
return x.cols( y );
}
// [[Rcpp::export]]
arma::mat matrix_subset_logical(const arma::mat& x,
const arma::vec& y) {
// Assumes that y is 0/1 coded.
// find() retrieves the integer index when y is equivalent 1.
return x.cols( arma::find(y == 1) );
}
Test
# Sample data
x = matrix(1:15, ncol = 5)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 7 10 13
# [2,] 2 5 8 11 14
# [3,] 3 6 9 12 15
# Subset only when 1 (TRUE) is found:
matrix_subset_logical(x, c(0, 1, 0, 0, 1))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
# Subset with an index representing the location
# Note: C++ indices start at 0 not 1!
matrix_subset_idx(x, c(1, 3))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
Pure Rcpp logic
If you do not want to take on the dependency of armadillo, then the equivalent for the matrix subset in Rcpp is:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_subset_idx_rcpp(
Rcpp::NumericMatrix x, Rcpp::IntegerVector y) {
// Determine the number of observations
int n_cols_out = y.size();
// Create an output matrix
Rcpp::NumericMatrix out = Rcpp::no_init(x.nrow(), n_cols_out);
// Loop through each column and copy the data.
for(unsigned int z = 0; z < n_cols_out; ++z) {
out(Rcpp::_, z) = x(Rcpp::_, y[z]);
}
return out;
}

error when passing `arma::cube`argument to function using RcppArmadillo

I am getting the following error when trying to compile using sourceCpp from Rcpppackage:
`my path to R/.../Rcpp/internal/Exporter.h`
no matching function for call to 'arma::Cube<double>::Cube(SEXPREC*&)'
The object cube is the armadillo equivalent of an array in R.
EDIT: Note that the problem seems to be that the function can't accept a arma::cube object as an argument. If we change arma::cube Bby arma::mat Bit does work:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
// [[Rcpp::export]]
arma::cube ssmooth(arma::mat A,
arma::cube B) {
int ns = A.n_rows;
int nk = A.n_cols;
int np = B.n_rows;
arma::mat C = zeros<mat>(nk, ns);
arma::cube D = zeros<cube>(nk, nk, ns);
return D;
}
I would appreciate any hint.
A basic example works:
R> cppFunction("arma::cube getCube(int n) { arma::cube a(n,n,n);\
a.zeros(); return a; }", depends="RcppArmadillo")
R> getCube(2)
, , 1
[,1] [,2]
[1,] 0 0
[2,] 0 0
, , 2
[,1] [,2]
[1,] 0 0
[2,] 0 0
R>
so either you are doing something wrong or your installation is off.
I had the same issue. The problem seems to be related to the combination "Rcpp::export" and cube as an argument of the exported function. My guess is that the converter from sexp to cube may not be implemented yet (no pun intended ;-)). (Or we are both missing something...).
Workaround when you want to have an arma::cube argument in a Rcpp::export function: get it first as a NumericVector and simply create the cube afterward...
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
// [[Rcpp::export]]
arma::cube ssmooth(arma::mat A,
NumericVector B_) {
IntegerVector dimB=B_.attr("dim");
arma::cube B(B_.begin(), dimB[0], dimB[1], dimB[2]);
//(rest of your code unchanged...)
int ns = A.n_rows;
int nk = A.n_cols;
int np = B.n_rows;
arma::mat C = zeros<mat>(nk, ns);
arma::cube D = zeros<cube>(nk, nk, ns);
return D;
}
I think your code fails because implicitly it tries to do casting like this:
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
cube return_thing(SEXP thing1){
cube thing2 = as<cube>(thing1);
return thing2;
}
/***R
thing <- 1:8
dim(thing) <- c(2, 2, 2)
return_thing(thing)
*/
which doesn't work, whereas it works for matrices:
#include<RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
//[[Rcpp::export]]
mat return_thing(SEXP thing1){
mat thing2 = as<mat>(thing1);
return thing2;
}
/***R
thing <- 1:4
dim(thing) <- c(2, 2)
return_thing(thing)
*/
I am able to read and return an arma cube with the following function :
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::cube return_cube(arma::cube X)
{
return(X);
}
For example, I obtain the following result when I run the following in R :
my_cube <- array(data = rnorm(5 * 3 * 2), dim = c(5,3, 2))
return_cube(my_cube)
, , 1
[,1] [,2] [,3]
[1,] 0.4815994 1.0863765 0.3278728
[2,] 1.4138699 -0.7809922 0.8341867
[3,] 0.6555752 -0.2708001 0.7701501
[4,] 1.1447104 -1.4064894 -0.2653888
[5,] 1.5972670 1.8368235 -2.2814959
, , 2
[,1] [,2] [,3]
[1,] -0.485091067 1.1826162 -0.3524851
[2,] 0.227652584 0.3005968 -0.6079604
[3,] -0.147653664 1.3463318 -1.2238623
[4,] 0.067090580 -0.8982740 -0.8903684
[5,] 0.006421618 -1.7156955 -1.2813880

rep(x, each=3) equivalent in armadillo

I'm porting a R function to c++ for use in RcppArmadillo, and I cannot find an elegant (efficient) way to repeat a column vector N times, element-by-element. Here's a minimal example, where I had to first create a matrix with 3 columns repeated, then reshape to a row vector, then transpose.
library(RcppArmadillo)
sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::colvec foo(const arma::colvec& u, const arma::colvec& v)
{
arma::colvec u_rep(12), result(12);
u_rep = trans(vectorise(repmat(u, 1, 3), 1)); // this seems inefficient
result = u_rep % v;
return(result);
}'
)
foo(1:4, 1:12)
The R equivalent would be,
fooR = function(u, v){
u_rep = rep(u, each=3)
u_rep * v
}
There is no known C++ operator or function that does this, so you may well have to do it by hand.
Worst case you just loop and copy (possibly in chunks). Armadillo does have indexing, so maybe that will help. R does a lot of checking when recycling so you probably have to account for that too.
By the way, you example mixes Attributes and inline. I'd just put the code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::colvec foo(const arma::colvec& u, const arma::colvec& v) {
arma::colvec u_rep(12), result(12);
u_rep = trans(vectorise(repmat(u, 1, 3), 1)); // this seems inefficient
result = u_rep % v;
return(result);
}
in a file bafoo.cpp and source it as follows:
R> sourceCpp("/tmp/bafoo.cpp")
R> foo(1:4, 1:12)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 8
[5,] 10
[6,] 12
[7,] 21
[8,] 24
[9,] 27
[10,] 40
[11,] 44
[12,] 48
R>

Insertion of column in armadillo gives different results from insertion in R before passing in RcppArmadillo, what have I done wrong?

Here is the cpp file:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
List mylm1(NumericVector yr, NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, true);
Rcout << X << std::endl;
arma::mat allOne(n, 1, arma::fill::ones);
X.insert_cols(0, allOne);
Rcout << X << std::endl;
arma::colvec y(yr.begin(), yr.size(), true);
arma::colvec coef = arma::solve(X, y);
arma::colvec resid = y - X*coef;
double sig2 = arma::as_scalar( arma::trans(resid)*resid/(n-k) );
arma::colvec stderrest = arma::sqrt( sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
arma::colvec tval = coef / stderrest;
return Rcpp::List::create(
Rcpp::Named("coefficients") = coef,
Rcpp::Named("stderr") = stderrest,
Rcpp::Named("tval") = tval
) ;
}
// [[Rcpp::export]]
List mylm2(NumericVector yr, NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, false);
Rcout << X << std::endl;
arma::colvec y(yr.begin(), yr.size(), false);
arma::colvec coef = arma::solve(X, y);
arma::colvec resid = y - X*coef;
double sig2 = arma::as_scalar( arma::trans(resid)*resid/(n-k) );
arma::colvec stderrest = arma::sqrt( sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
return Rcpp::List::create(
Rcpp::Named("coefficients") = coef,
Rcpp::Named("stderr") = stderrest
) ;
}
Then in R:
sourceCpp('b.cpp')
set.seed(1)
x = matrix(rnorm(100), 25, 4)
y = rnorm(25)
mylm1(y, x)
mylm2(y, cbind(1, x))
summary(lm(y~x))
mylm1(y, x)
# 1.0000 -0.6265 -0.0561 0.3981 0.2914
# 1.0000 0.1836 -0.1558 -0.6120 -0.4433
# ...
# 1.0000 -1.9894 -0.1123 -0.9341 -1.2246
# 1.0000 0.6198 0.8811 -1.2536 -0.4734
#
#
# $stderr
# [,1]
# [1,] 0.16765
# [2,] 0.18009
# [3,] 0.24104
# [4,] 0.15117
# [5,] 0.21064
mylm2(y, cbind(1, x))
# 1.0000 -0.6265 -0.0561 0.3981 0.2914
# 1.0000 0.1836 -0.1558 -0.6120 -0.4433
# ...
# 1.0000 -1.9894 -0.1123 -0.9341 -1.2246
# 1.0000 0.6198 0.8811 -1.2536 -0.4734
#
# $stderr
# [,1]
# [1,] 0.17179
# [2,] 0.18454
# [3,] 0.24699
# [4,] 0.15490
# [5,] 0.21584
summary(lm(y~x))
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.1122 0.1718 0.65 0.52
# x1 0.0499 0.1845 0.27 0.79
# x2 0.1076 0.2470 0.44 0.67
# x3 -0.0435 0.1549 -0.28 0.78
# x4 -0.1750 0.2158 -0.81 0.43
According to the output of Rcout, the two methods generates the same X matrix in the end, but they don't give the same stand errors, why?
Your formula for the stdandard errors is wrong. You forgot to adjust k for the added column. The fact that the matrices actually are the same should have made you a little suspicious about the rest of the code.
And I already mentioned to you in the comments to your last question that you need to treat R2 differently for the intercept and 'no intercept' cases. It is the same thing here with the standard errors, and the root cause of your problem here. In short, no problem with the matrix operations, but a problem when you extract and transform data later.
Take the hint and read the RcppArmadillo sources -- the combination of the fastLm.cpp code and the corresponding R functions is really short. Or else read up on the finer details of how to compute linear least squares and assorted analytics.
Oh, and I also have the same objection to your question title as before. RcppArmadillo is once again not at fault; your usage however is.