I pass a 3D array from R into C++ and ran into type conversion issues. How do we transform arma::cube subviews from RcppArmadillo into NumericVectors to operate on them using sugar functions from Rcpp like which_min?
Say you have a 3D cube Q with some numeric entries. My goal is to get the index of the minimum value of the column entries for each row i and for each third dimension k. In R syntax this is which.min(Q[i,,k]).
For example for i = 1 and k = 1
cube Q = randu<cube>(3,3,3);
which_min(Q.slice(1).row(1)); // this fails
I thought a conversion to NumericVector would do the trick, but this conversion fails
which_min(as<NumericVector>(Q.slice(1).row(1))); // conversion failed
How can I get this to work? Thank you for your help.
You have a couple of options here:
You can just use the Armadillo function for this, the member function .index_min() (see Armadillo documentation here).
You can use Rcpp::wrap(), which "transforms an arbitrary object into a SEXP" to turn the arma::cube subviews into a Rcpp::NumericVector and use the sugar function Rcpp::which_min().
Initially I just had the first option there as the answer since it seems a more straightforward way to accomplish your objective, but I add the second option (in an update to the answer) since I now consider that arbitrary conversions may be a part of what you're curious about.
I put the following C++ code in a file so-answer.cpp:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::List index_min_test() {
arma::cube Q = arma::randu<arma::cube>(3, 3, 3);
int whichmin = Q.slice(1).row(1).index_min();
Rcpp::List result = Rcpp::List::create(Rcpp::Named("Q") = Q,
Rcpp::Named("whichmin") = whichmin);
return result;
}
// [[Rcpp::export]]
Rcpp::List which_min_test() {
arma::cube Q = arma::randu<arma::cube>(3, 3, 3);
Rcpp::NumericVector x = Rcpp::wrap(Q.slice(1).row(1));
int whichmin = Rcpp::which_min(x);
Rcpp::List result = Rcpp::List::create(Rcpp::Named("Q") = Q,
Rcpp::Named("whichmin") = whichmin);
return result;
}
We have one function that uses Armadillo's .index_min() and one that uses Rcpp::wrap() to enable the use of Rcpp::which_min().
Then I use Rcpp::sourceCpp() to compile it, make the functions available to R, and demonstrate calling them with a couple of different seeds:
Rcpp::sourceCpp("so-answer.cpp")
set.seed(1)
arma <- index_min_test()
set.seed(1)
wrap <- which_min_test()
arma$Q[2, , 2]
#> [1] 0.2059746 0.3841037 0.7176185
wrap$Q[2, , 2]
#> [1] 0.2059746 0.3841037 0.7176185
arma$whichmin
#> [1] 0
wrap$whichmin
#> [1] 0
set.seed(2)
arma <- index_min_test()
set.seed(2)
wrap <- which_min_test()
arma$Q[2, , 2]
#> [1] 0.5526741 0.1808201 0.9763985
wrap$Q[2, , 2]
#> [1] 0.5526741 0.1808201 0.9763985
arma$whichmin
#> [1] 1
wrap$whichmin
#> [1] 1
library(microbenchmark)
microbenchmark(arma = index_min_test(), wrap = which_min_test())
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> arma 12.981 13.7105 15.09386 14.1970 14.9920 62.907 100 a
#> wrap 13.636 14.3490 15.66753 14.7405 15.5415 64.189 100 a
Created on 2018-12-21 by the reprex package (v0.2.1)
Related
I have a problem with the "stackApply" function from the raster-package. First I want to stack three raster layers (each layer has one band) - that works. And then I want to create a raster-object that shows in which of the three bands/layers the minimum value occurs (each pixel in the raster layers has a different value). But I get various error messages. Does anyone have an idea how I can solve the problem?
Thank you
stacktest<-stack(test,test1,test2)
min_which <- stackApply(stacktest, indices=1, fun=function(x, na.rm=NULL)which.min(x))
Error in setValues(out, v) : values must be a vector
Error in is.infinite(v) : not implemented standard method for type 'list'
Here is a minimal, self-contained, reproducible example:
Example data from ?stackApply
library(raster)
r <- raster(ncol=10, nrow=10)
values(r) <- 1:ncell(r)
s <- stack(r,r,r,r,r,r)
s <- s * 1:6
Now use these data with your function (I removed the na.rm=NULL as it is not used)
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x) )
w
#class : RasterLayer
#dimensions : 10, 10, 100 (nrow, ncol, ncell)
#resolution : 36, 18 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#crs : +proj=longlat +datum=WGS84 +no_defs
#source : memory
#names : index_1
#values : 1, 1 (min, max)
Same for which.max
w <- stackApply(s, indices=1, fun=function(x, na.rm=NULL) which.max(x) )
w
# (...)
#values : 6, 6 (min, max)
This suggest it works fine. In most cases that means that you probably have cells that are NA
s[1:10] <- NA
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x) )
# Error in setValues(out, v) : values must be numeric, logical or factor
It is easy to see why this error occurs
which.min(3:1)
#[1] 3
which.min(c(3:1, NA))
#[1] 3
which.min(c(NA, NA, NA))
#integer(0)
If all values are NA, which.min does not return NA as expected. Instead it returns an empty vector. That can be fixed like this
which.min(c(NA, NA, NA))[1]
#[1] NA
And you can do
w <- stackApply(s, indices=1, fun=function(x, ...) which.min(x)[1] )
However, using stackApply with indices=1 is not a good approach. You should generally use calc to compute cell values across all layers.
y <- calc(s, function(x) which.min(x)[1])
But in this case you can use the more straightforward
z <- which.min(s)
I have a C++ function that is called inside an R function using Rcpp packgae. The R function accepts an inputDataFrame and uses the C++ function (also accepts a DataFrame) to calculate drug amounts (A1) as a function with time. R then returns the inputDataFrame with added column for the calculated amounts A1.
I have trouble making an Rpackage for this function. I followed RStudio instruction but I ran into an error when building the package. The error is in the RcppExport.cpp file and states that 'OneCompIVbolusCpp' was not declared in this scope.
Here are the codes for the C++ and R functions. They work perfectly fine in R when I process an example dataframe.
Rfunction OneCompIVbolus_Rfunction.R:
library(Rcpp)
sourceCpp("OneCompIVbolusCppfunction.cpp")
OneCompIVbolusRCpp <- function(inputDataFrame){
inputDataFrame$A1[inputDataFrame$TIME==0] <- inputDataFrame$AMT[inputDataFrame$TIME==0]
OneCompIVbolusCpp( inputDataFrame )
inputDataFrame
}
C++ function OneCompIVbolusCppfunction.cpp:
#include <Rcpp.h>
#include <math.h>
#include <iostream>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
// input Dataframe from R
DataFrame OneCompIVbolusCpp(DataFrame inputFrame){
// Create vectors of each element used in function and for constructing output dataframe
Rcpp::DoubleVector TIME = inputFrame["TIME"];
Rcpp::DoubleVector AMT = inputFrame["AMT"];
Rcpp::DoubleVector k10 = inputFrame["k10"];
Rcpp::DoubleVector A1 = inputFrame["A1"];
double currentk10, currentTime, previousA1, currentA1;
// in C++ arrays start at index 0, so to start at 2nd row need to set counter to 1
// for counter from 1 to the number of rows in input data frame
for(int counter = 1; counter < inputFrame.nrows(); counter++){
// pull out all the variables that will be used for calculation
currentk10 = k10[ counter ];
currentTime = TIME[ counter ] - TIME[ counter - 1];
previousA1 = A1[ counter - 1 ];
// Calculate currentA1
currentA1 = previousA1*exp(-currentTime*currentk10);
// Fill in Amounts and check for other doses
A1[ counter ] = currentA1 + AMT[ counter ];
} // end for loop
return(0);
}
Any hints on what am I doing wrong here? How may I solve this issue?
Edit:
Here is an example of running the composite function OneCompIVbolusRCpp in R:
library(plyr)
library(Rcpp)
source("OneCompIVbolus_Rfunction.R")
#-------------
# Generate df
#-------------
#Set dose records:
dosetimes <- c(0,12)
#set number of subjects
ID <- 1:2
#Make dataframe
df <- expand.grid("ID"=ID,"TIME"=sort(unique(c(seq(0,24,1),dosetimes))),"AMT"=0,"MDV"=0,"CL"=2,"V"=10)
doserows <- subset(df, TIME%in%dosetimes)
#Dose = 100 mg, Dose 1 at time 0
doserows$AMT[doserows$TIME==dosetimes[1]] <- 100
#Dose 2 at 12
doserows$AMT[doserows$TIME==dosetimes[2]] <- 50
#Add back dose information
df <- rbind(df,doserows)
df <- df[order(df$ID,df$TIME,df$AMT),] # arrange df by TIME (ascending) and by AMT (descending)
df <- subset(df, (TIME==0 & AMT==0)==F) # remove the row that has a TIME=0 and AMT=0
df$k10 <- df$CL/df$V
#-------------
# Apply the function
#-------------
simdf <- ddply(df, .(ID), OneCompIVbolusRCpp)
You may simply have the wrong ordering. Instead of
// [[Rcpp::export]]
// input Dataframe from R
DataFrame OneCompIVbolusCpp(DataFrame inputFrame){
// ...
do
// input Dataframe from R
// [[Rcpp::export]]
DataFrame OneCompIVbolusCpp(DataFrame inputFrame){
// ...
as the [[Rcpp::export]] tag must come directly before the function it exports.
I use rolling regression in R quite a lot and my initial setup is something like:
dolm <- function(x) coef(lm(x[,1] ~ x[,2] + 0, data = as.data.frame(x)))
rollingCoef = rollapply(someData, 100, dolm)
Above example works perfectly, except it is slow if you have a lot of iterations.
To speed it up I've decided to experiment with Rcpp package.
First I substituted lm with fastLm, result is a bit faster but still slow. So that pushed me to attempt to write the entire rolling regression's coefficients function in c++ as for loop and than integrate it in R with Rcpp help.
So I've changed original RcppArmadillo's function fastLm to this:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
List rollCoef(const arma::mat& X, const arma::colvec& y, double window ) {
double cppWindow = window - 1;
double matRows = X.n_rows;
double matCols = X.n_cols - 1;
arma::mat coef( matRows - cppWindow, X.n_cols); // matrix for estimated coefficients
//for loop for rolling regression.
for( double i = 0 ; i < matRows - cppWindow ; i++ )
{
coef.row(i) = arma::trans(arma::solve(X( arma::span(i,i + cppWindow), arma::span(0,matCols)) , y.rows(i,i + cppWindow)));
}
return List::create(_["coefficients"] = coef);
}
and than download it to R with sourceCpp(file=".../rollCoef.cpp")
So it's much faster than rollapply and it worked fine on small examples, but than I applied it to ~200000 observations of data it produced ~half of NA's in output, in the same time rollapply/fastLm combination didn't produce any.
So here I need some help. What is wrong with my function? Why are there NA's in my function output, and no NA's in rollapply/fastLm, however, if I understand right, them both based on arma::solve? Any help is highly appreciated.
UPDATE
Here is reproducible code:
require(Rcpp)
require(RcppArmadillo)
require(zoo)
require(repmis)
myData <- source_DropboxData(file = "example.csv",
key = "cbrmkkbssu5bn96", sep = ",", header = TRUE)
## in order to use my custom function "rollCoef" you should download it to R.
## The c++ code is presented above in the main question.
## Download it where you want as "rollCoef.cpp" and then download it to R with:
sourceCpp(file=".../rollCoeff.cpp"). # there should be your actual path.
myCoef = rollCoef(as.matrix(myData[,2]),myData[,1],260)
summary(unlist(myCoef)) # 80923 NA's
dolm = function(x) coef(fastLmPure(as.matrix(x[,2]), x[,1]))
myCoef2 = rollapply(myData, 260, dolm, by.column = FALSE)
summary(myCoef2) # 80923 NA's
dolm2 = function(x) coef(fastLm(x[,1] ~ x[,2] + 0, data = as.data.frame(x)))
myCoef3 = rollapply(myData, 260, dolm2, by.column = FALSE)
summary(myCoef3) # !!! No NA's !!!
head(unlist(myCoef)) ; head(unlist(myCoef2)) ; head(myCoef3)
So the output of my function is identical to output of RcppArmadillo's fastLmPure combined with rollapply and them both produce NA's, but rollapply with fastLm does not. As I understand, for example from HERE and HERE fastLm is basically calling to fastLmPure, but why is there no NA's in the third method than? Is there some additional capabilities in fastLm that prevent NA's that I didn't spotted?
There is an entire package RcppRoll to do just that custom rolling -- and you should be able to extend it and its rollit() function to do rolling lm() as well.
I'm experimenting the Rcpp & inline packages to speed up my computation..
I want to know how to make C++ CODE (not a c++ function) work
with these packages?
Here is an example I tried which consists of building a dynamic array called 'tableau' and display the result. I both tried the 'cppFunction' and 'cxxfunction' but no success...
Can someone gives me a hint?
require(inline);require(Rcpp)
src='vector < vector < int > > tableau (
{
{1,2,3,42},
{0,2,3},
{11,12}
}
);
return tableau;
'
cppFunction(src)
Error in sourceCpp(code = code, env = env, rebuild = rebuild, showOutput = showOutput, :
Error 1 occurred building shared library.
In addition: Warning message:
No function found for Rcpp::export attribute at file7bc1b0f5993.cpp:5
R has no idea what to do with a <vector <vector <int>>.
To return a list, you have to use the type List and NumericVector:
src = 'List tableau() {
NumericVector v1 = NumericVector::create(1,2,3,42);
NumericVector v2 = NumericVector::create(0,2,3);
NumericVector v3 = NumericVector::create(11,12);
return List::create(v1, v2, v3);
}'
createTableau <- cppFunction(src)
createTableau()
## [[1]]
## [1] 1 2 3 42
##
## [[2]]
## [1] 0 2 3
##
## [[3]]
## [1] 11 12
You should really read at least some of the documentation. Here's a good place to start: Rcpp Tutorial
According to the manual, Filter works on vectors, and it happens to work also on lists, eg.:
z <- list(a=1, b=2, c=3)
Filter(function(i){
z[[i]] > 1
}, z)
$b
[1] 2
$c
[1] 3
However, it doesn't work on lists of lists, eg.:
z <- list(z1=list(a=1,b=2,c=3), z2=list(a=1,b=1,c=1), z3=list())
Filter(function(i){
if(length(z[[i]])>0){
if(z[[i]]$b > 1)
TRUE
else
FALSE
}
else
FALSE
}, z)
Error in z[[i]] : invalid subscript type 'list'
What is the best way then to filter lists of lists without using nested loops? It could also be lists of lists of lists...
(I tried with nested lapply's instead, but couldn't manage to make it work.)
Edit: in the 2nd example, here is what I want to obtain:
list(z1=list(a=1,b=2,c=3))
that is, without z$z2 because z$z2$b < 1, and without z$z3 because it is empty.
I think you should use:
Filter(function(x){length(x)>0 && x[["b"]] > 1},z)
The predicate (the function you are using to filter z) applies to the elements of z, not their indexes.
The modern tidy solution to this problem would be:
library(tidyverse)
z <- list(z1=list(a=1,b=2,c=3), z2=list(a=1,b=1,c=1), z3=list())
Then simply:
tibble(disc = z, Names = names(z)) %>%
hoist(disc, c = "c") %>%
filter(c == 3) %>%
unnest_wider(disc) %>%
split(.$Names) %>% map(select, -Names) %>%
map(as.list)
Note this is now super flexible, and easily allows other filtering, e.g. if a == 1
I had never used Filter prior to your question, so this was a good exercise for first thing in the morning :)
There are at least a couple of things going on that are tripping you up (I think).
Let's start with your first simple anonymous function, but let's make it stand alone so it's easier to read:
f <- function(i){
z[[i]] > 1
}
It should jump out at you that this function takes one argument, i, yet in the function it calls z. That's not very good "functional" programming :)
So start by changing that function to:
f <- function(i){
i > 1
}
And you'll see Filter will actually run against a list of lists:
z <- list(z1=list(a=1,b=2,c=3), z2=list(a=1,b=1,c=1))
Filter( f, z)
but it returns:
> Filter( f, z)
$z2
$z2$a
[1] 1
$z2$b
[1] 1
$z2$c
[1] 1
$<NA>
NULL
which isn't exactly what you want. Honestly I can't grok why it returns that result, maybe someone can explain it to me.
#DWin was barking up the right tree when he said that there should be a recursive solution. I hacked up a first stab at a recursive function, but you'll need to improve on it:
fancyFilter <- function(f, x){
if ( is.list( x[[1]] ) ) #only testing the first element... bad practice
lapply( x, fancyFilter, f=f ) #recursion FTW!!
else
return( lapply(x, Filter, f=f ) )
}
fancyFilter looks at the first element of the x passed to it and if that element is a list, it recursively calls fancyFilter on each element of the list. But what if element #2 is not a list? That's the sort of thing you should test and tease out whether it matters for you. But the result of fancyFilter seems to look like what you are after:
> fancyFilter(f, z)
$z1
$z1$a
numeric(0)
$z1$b
[1] 2
$z1$c
[1] 3
$z2
$z2$a
numeric(0)
$z2$b
numeric(0)
$z2$c
numeric(0)
You may want to add some logic to clean up the output so the FALSE results don't get molested into numeric(0). And, obviously, I did an example using only your simple function, not the more complex function you used in the second example.
No claims for beauty here and it does not do a depth search:
z2 <- lapply(z, function(x){ if( "b" %in% names(x) && x[["b"]] >1 ) x else {} } )
z2[unlist(lapply(z2, is.null))] <- NULL
> z2
$z1
$z1$a
[1] 1
$z1$b
[1] 2
$z1$c
[1] 3
EDIT: This code will traverse a list and assemble the nodes that have 'b' > 1. It needs some work to properly label the nodes. First a list with deeper nesting:
z <- list(z1=list(a=1,b=2,c=3), z2=list(a=1,b=1,c=1), z3=list(),
z4 = list(z5=list(a=5,b=6,c=7), z6=list(a=7,b=8,c=9)))
checkbGT1 <- function(ll){ root <- list()
for(i in seq_along(ll) ) {if ("b" %in% names(ll[[i]]) && ll[[i]]$b >1) {
root <- c(root, ll[[i]])
}else{
if( length(ll[[i]]) && is.list(ll[[i]]) )
{ root <- c(root, list(checkbGT1( ll[[i]] ))) }
}
}
return(root) }
Filter sub list by key.
Written in reading the answers which help me.
zall<-list(z1=list(list(key=1,b=2,c=3),list(key=2,b=3,c=4)))
zall
#> $z1
#> $z1[[1]]
#> $z1[[1]]$key
#> [1] 1
#>
#> $z1[[1]]$b
#> [1] 2
#>
#> $z1[[1]]$c
#> [1] 3
#>
#>
#> $z1[[2]]
#> $z1[[2]]$key
#> [1] 2
#>
#> $z1[[2]]$b
#> [1] 3
#>
#> $z1[[2]]$c
#> [1] 4
lapply(zall$z1, function(x){ x[intersect(names(x),"key")] } )
#> [[1]]
#> [[1]]$key
#> [1] 1
#>
#>
#> [[2]]
#> [[2]]$key
#> [1] 2
lapply(zall$z1, function(x){ x[setdiff(names(x),"key")] } )
#> [[1]]
#> [[1]]$b
#> [1] 2
#>
#> [[1]]$c
#> [1] 3
#>
#>
#> [[2]]
#> [[2]]$b
#> [1] 3
#>
#> [[2]]$c
#> [1] 4