Rcpp select/subset NumericMatrix column by a NumericVector - c++

I can select all the rows of a matrix and a range of columns of a matrix as follows:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x){
return x(_, Range(0, 1));
}
')
However, I would like to select columns based on a NumericVector y which, for instance, could be something like c(0, 1, 0, 0, 1). I tried this:
library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x, NumericVector y){
return x(_, y);
}
')
but it doesn't compile. How do I do it?

Alas, Rcpp doesn't have great support for non-contiguous views or selecting in a single statement only columns 1 and 4. As you saw, selecting contiguous views or selecting all columns is accessible with Rcpp::Range(). You'll likely want to upgrade to RcppArmadillo for better control over matrix subsets.
RcppArmadillo subset examples
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat matrix_subset_idx(const arma::mat& x,
const arma::uvec& y) {
// y must be an integer between 0 and columns - 1
// Allows for repeated draws from same columns.
return x.cols( y );
}
// [[Rcpp::export]]
arma::mat matrix_subset_logical(const arma::mat& x,
const arma::vec& y) {
// Assumes that y is 0/1 coded.
// find() retrieves the integer index when y is equivalent 1.
return x.cols( arma::find(y == 1) );
}
Test
# Sample data
x = matrix(1:15, ncol = 5)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 7 10 13
# [2,] 2 5 8 11 14
# [3,] 3 6 9 12 15
# Subset only when 1 (TRUE) is found:
matrix_subset_logical(x, c(0, 1, 0, 0, 1))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
# Subset with an index representing the location
# Note: C++ indices start at 0 not 1!
matrix_subset_idx(x, c(1, 3))
# [,1] [,2]
# [1,] 4 13
# [2,] 5 14
# [3,] 6 15
Pure Rcpp logic
If you do not want to take on the dependency of armadillo, then the equivalent for the matrix subset in Rcpp is:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_subset_idx_rcpp(
Rcpp::NumericMatrix x, Rcpp::IntegerVector y) {
// Determine the number of observations
int n_cols_out = y.size();
// Create an output matrix
Rcpp::NumericMatrix out = Rcpp::no_init(x.nrow(), n_cols_out);
// Loop through each column and copy the data.
for(unsigned int z = 0; z < n_cols_out; ++z) {
out(Rcpp::_, z) = x(Rcpp::_, y[z]);
}
return out;
}

Related

Submatrix summation doesn't work for NumericMatrix Rcpp

I observed the following weird situation when defining an Rcpp function.
The first function works perfectly, and calculates the sum of submatrix s
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double Sub(NumericMatrix m){
NumericMatrix s = m(Range(0,0),Range(0,1));
return sum(s);
}
However, when I alternate the code in the following way
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
double Sub(NumericMatrix m){
return sum(m(Range(0,0),Range(0,1)));
}
It doesn't work which is really counterintuitive. The error that I get is the following no matching function for call to 'sum(Rcpp::Matrix<14>::Sub)'
I just skip a step, where I define the submatrix NumericMatrix s.
A matrix that can be used to check that is the following
n = matrix(c(6090,16,0,0,618,1036,3,0,99,0,312,4,25,0,0,3,0,0,0,0,1794,0,0,0,0),5,5,byrow=TRUE)
n
[,1] [,2] [,3] [,4] [,5]
[1,] 6090 16 0 0 618
[2,] 1036 3 0 99 0
[3,] 312 4 25 0 0
[4,] 3 0 0 0 0
[5,] 1794 0 0 0 0

Understanding strange behavior in Rcpp

I have run into something I cannot wrap my head around. It's part of a larger coding effort, but a minimal example is here:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(arma::vec & tau2, const arma::vec & nu) {
arma::vec bet = Rcpp::rnorm(3);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2)/nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
(tau2, although a scalar, is a vector here because I want to pass by reference: function pass by reference in RcppArmadillo)
What is puzzling me is that if I now run the following R code:
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
I get:
error: element-wise division: incompatible matrix dimensions: 3x1 and 18x1
Error in foo(tau2, nu) :
element-wise division: incompatible matrix dimensions: 3x1 and 18x1
where the 18x1 varies; mostly it's 0x1 but it's always a multiple of 3.
Looking at the output:
> nu
[,1] [,2] [,3] [,4]
[1,] 4.165242 4.165242 4.165242 4.165242
[2,] 4.165242 4.165242 4.165242 4.165242
[3,] 4.165242 4.165242 4.165242 4.165242
> upd
$nu
[,1]
[1,] 1
[2,] 1
[3,] 1
$tau2
[,1]
[1,] 4.165242
That is, despite declaring nu as a constant reference (which I do because I do not want it changed), it is altered. The value it is filled with is upd$tau2 (but why?).
Strangely, I can make the behavior go away by seemingly meaningless changes by:
putting tau2 <- 1.0 or nu <- matrix(1, n, 1) (or both) outside of the loop
removing the reference in the first argument (i.e. using arma::vec tau2)
not dividing pow(bet, 2) by nu
changing to nu <- rep(1, n)
Perhaps the most confusing part is that if I select the code chunk inside of the loop and repeatedly run it, it works(!). However, if I run the R code using the loop it crashes on the second iteration.
Because I seem to be able to fix the problem, I'm mostly interested in learning what is going on here. I suspect it's just a consequence of my lack of expertise in C++ and recklessness with various variable types, so knowing what is causing all of this would be very valuable.
Two fixes:
tau2 is a double (mimic #dirkeddelbuettel here)
Temporary variable for the NumericVector of length n being generated prior to saving into bet
Code:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(double tau2, const arma::vec & nu) {
int n = nu.n_elem;
Rcpp::NumericVector x = Rcpp::rnorm(n);
arma::vec bet = arma::vec(x.begin(), n, true, false);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2) / nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
Test case:
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
upd
#> $nu
#> [,1]
#> [1,] 1
#> [2,] 1
#> [3,] 1
#>
#> $tau2
#> [1] 3.292889
If I change the interface to using a double it all works:
Code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List foo(double & tau2, const arma::vec & nu) {
arma::vec bet = Rcpp::rnorm(3);
tau2 = R::rgamma(1, arma::as_scalar(sum(pow(bet, 2)/nu)));
return Rcpp::List::create(Rcpp::Named("nu") = nu,
Rcpp::Named("tau2") = tau2);
}
/*** R
n <- 3
m <- matrix(0, n, 1)
for (r in 1:1000) {
tau2 <- 1.0
nu <- matrix(1, n, 1)
upd <- foo(tau2, nu)
}
*/
Demo
R> sourceCpp("/tmp/hejseb.cpp")
R> n <- 3
R> m <- matrix(0, n, 1)
R> for (r in 1:1000) {
+ tau2 <- 1.0
+ nu <- matrix(1, n, 1)
+ upd <- foo(tau2, nu)
+ }
R> upd
$nu
[,1]
[1,] 1
[2,] 1
[3,] 1
$tau2
[1] 1.77314
R>
I am not sure if those are the numbers you expected. I don't really have time to work through what you are trying to do.

Column-wise ordering of matrix

I am new to RcppArmadillo. I am wondering how I can make a column-wise ordered matrix by the index of given vector. I know how to do it in R, but in RcppArmadillo it does not working. For example, in R,
aa = c(2,4,1,3)
# [1] 2 4 1 3
bb = cbind(c(1,5,4,2),c(3,1,0,8))
# [,1] [,2]
# [1,] 1 3
# [2,] 5 1
# [3,] 4 0
# [4,] 2 8
Trying to subset with R gives:
cc = bb[aa,]
# [,1] [,2]
# [1,] 5 1
# [2,] 2 8
# [3,] 1 3
# [4,] 4 0
I've tried the following using RcppArmadillo:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List example(arma::vec aa,arma::mat bb){
int p = bb.n_rows;
int n = aa.size();
arma::uvec index_aa=sort_index(aa);;
List cc(n);
for(int it=0; it<p; it++){
cc(it) = bb.each_col();
}
return List::create(cc);
}
and,
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
List example(arma::vec aa,arma::mat bb){
arma::uvec index_aa=sort_index(aa);
return List::create(bb.elem(index_aa));
}
Not sure why you are sorting the index here as that causes a new order to be introduced compared to bb[aa,].
Anyway, the idea here is to subset using the .rows() index, which requires a uvec or unsigned integer vector. As aa contains R indexes, we can translate them from R to C++ by subtracting 1 to take it from a 1-based index system to a 0-based index system.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat example_subset(arma::uvec aa, arma::mat bb){
// Convert to a C++ index from R (1 to 0-based indices)
aa = aa - 1;
return bb.rows(aa);
}
Test code:
aa = c(2, 4, 1, 3)
bb = cbind(c(1, 5, 4, 2), c(3, 1, 0, 8))
cpp_cc = example_subset(aa, bb)
r_cc = cbind(c(5,2,1,4),c(1,8,3,0))
all.equal(cpp_cc, r_cc)
# [1] TRUE

Concatenate NumericMatrix with Rcpp

I would like to collapse the rows of a transposed NumericMatrix using Rcpp. For instance:
library("data.table")
library("Rcpp")
dt1 <- data.table(V1=c(1, 0, 2),
V2=c(1, 1, 0),
V3=c(1, 0, 1),
V4=c(0, 1, 2),
V5=c(1, 1, 1))
cppFunction('NumericMatrix transpose(DataFrame data) {
NumericMatrix genotypes = internal::convert_using_rfunction(data, "as.matrix");
NumericMatrix tgeno(data.ncol(), data.nrow());
int number_samples = data.ncol();
int number_snps = data.nrow();
for (int i = 0; i < number_snps; i++) {
for (int j = 0; j < number_samples; j++) {
tgeno(j,i) = genotypes(i,j);
}
}
return tgeno;
}')
dt1
transpose(dt1)
Original Matrix
V1 V2 V3 V4 V5
1: 1 1 1 0 1
2: 0 1 0 1 1
3: 2 0 1 2 1
Transposed Matrix
[,1] [,2] [,3]
[1,] 1 0 2
[2,] 1 1 0
[3,] 1 0 1
[4,] 0 1 2
[5,] 1 1 1
I would like to have the following matrix:
[,1]
[1,] 102
[2,] 110
[3,] 101
[4,] 012
[5,] 111
Could anyone suggest a way to do this?
Maybe as a starting point, assuming that the numbers you concatenate consist only of a single digit:
//' #export
// [[Rcpp::export]]
std::vector<std::string> string_collapse(const Rcpp::DataFrame& data)
{
R_xlen_t nrow = data.nrow();
R_xlen_t ncol = data.ncol();
std::vector<std::string> ret(ncol);
for (R_xlen_t j = 0; j < ncol; ++j) {
const auto& col = Rcpp::as<Rcpp::NumericVector>(data[j]);
std::string ccstr;
ccstr.reserve(nrow);
for (const auto& chr: col) {
ccstr += std::to_string(chr)[0];
}
ret[j] = ccstr;
}
return ret;
}
It gives
dat <- data.frame(V1=c(1, 0, 2),
V2=c(1, 1, 0),
V3=c(1, 0, 1),
V4=c(0, 1, 2),
V5=c(1, 1, 1))
string_collapse(dat)
[1] "102" "110" "101" "012" "111"
But a quick benchmark comparing it to a pure R-solution suggests that you should not expect miracles. Probably there is still room for optimization.
Once you have transposed the matrix you can collapse the rows as follows:
matrix(apply(dt1, 1, paste0, collapse = ""), ncol = 1)

Is there a c++ matrix library where I can index matrices with non-contiguous vectors as in R?

I believe boost has a limitation on contiguous or at least step-wise consistent slicing of matrices. In R, I could have a random vector c(5,2,8) and use that to index into a matrix M[c(5,2,8),] for example...
Armadillo supports this as of version 3.0 which was released not even two weeks ago.
Here is a worked example via RcppArmadillo:
R> library(inline)
R>
R> code <- '
+ arma::mat M = Rcpp::as<arma::mat>(m); // normal matrix
+ arma::uvec V = Rcpp::as<arma::uvec>(v); // unsigned int vec
+ arma::mat N = M.cols(V); // index matrix by vec
+ return Rcpp::wrap(N);
+ '
R>
R> fun <- cxxfunction(signature(m="numeric", v="integer"),
+ code,
+ plugin="RcppArmadillo")
R> M <- matrix(1:25,5,5)
R> V <- c(1L, 3L, 5L) - 1 # offset by one for zero indexing
R> fun(M, V)
[,1] [,2] [,3]
[1,] 1 11 21
[2,] 2 12 22
[3,] 3 13 23
[4,] 4 14 24
[5,] 5 15 25
R>
There is a matching function to pick rows rather than columns.