using C++ code in R - c++

I'm experimenting the Rcpp & inline packages to speed up my computation..
I want to know how to make C++ CODE (not a c++ function) work
with these packages?
Here is an example I tried which consists of building a dynamic array called 'tableau' and display the result. I both tried the 'cppFunction' and 'cxxfunction' but no success...
Can someone gives me a hint?
require(inline);require(Rcpp)
src='vector < vector < int > > tableau (
{
{1,2,3,42},
{0,2,3},
{11,12}
}
);
return tableau;
'
cppFunction(src)
Error in sourceCpp(code = code, env = env, rebuild = rebuild, showOutput = showOutput, :
Error 1 occurred building shared library.
In addition: Warning message:
No function found for Rcpp::export attribute at file7bc1b0f5993.cpp:5

R has no idea what to do with a <vector <vector <int>>.
To return a list, you have to use the type List and NumericVector:
src = 'List tableau() {
NumericVector v1 = NumericVector::create(1,2,3,42);
NumericVector v2 = NumericVector::create(0,2,3);
NumericVector v3 = NumericVector::create(11,12);
return List::create(v1, v2, v3);
}'
createTableau <- cppFunction(src)
createTableau()
## [[1]]
## [1] 1 2 3 42
##
## [[2]]
## [1] 0 2 3
##
## [[3]]
## [1] 11 12
You should really read at least some of the documentation. Here's a good place to start: Rcpp Tutorial

Related

Transform arma::cube subview into NumericVector to use sugar

I pass a 3D array from R into C++ and ran into type conversion issues. How do we transform arma::cube subviews from RcppArmadillo into NumericVectors to operate on them using sugar functions from Rcpp like which_min?
Say you have a 3D cube Q with some numeric entries. My goal is to get the index of the minimum value of the column entries for each row i and for each third dimension k. In R syntax this is which.min(Q[i,,k]).
For example for i = 1 and k = 1
cube Q = randu<cube>(3,3,3);
which_min(Q.slice(1).row(1)); // this fails
I thought a conversion to NumericVector would do the trick, but this conversion fails
which_min(as<NumericVector>(Q.slice(1).row(1))); // conversion failed
How can I get this to work? Thank you for your help.
You have a couple of options here:
You can just use the Armadillo function for this, the member function .index_min() (see Armadillo documentation here).
You can use Rcpp::wrap(), which "transforms an arbitrary object into a SEXP" to turn the arma::cube subviews into a Rcpp::NumericVector and use the sugar function Rcpp::which_min().
Initially I just had the first option there as the answer since it seems a more straightforward way to accomplish your objective, but I add the second option (in an update to the answer) since I now consider that arbitrary conversions may be a part of what you're curious about.
I put the following C++ code in a file so-answer.cpp:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::List index_min_test() {
arma::cube Q = arma::randu<arma::cube>(3, 3, 3);
int whichmin = Q.slice(1).row(1).index_min();
Rcpp::List result = Rcpp::List::create(Rcpp::Named("Q") = Q,
Rcpp::Named("whichmin") = whichmin);
return result;
}
// [[Rcpp::export]]
Rcpp::List which_min_test() {
arma::cube Q = arma::randu<arma::cube>(3, 3, 3);
Rcpp::NumericVector x = Rcpp::wrap(Q.slice(1).row(1));
int whichmin = Rcpp::which_min(x);
Rcpp::List result = Rcpp::List::create(Rcpp::Named("Q") = Q,
Rcpp::Named("whichmin") = whichmin);
return result;
}
We have one function that uses Armadillo's .index_min() and one that uses Rcpp::wrap() to enable the use of Rcpp::which_min().
Then I use Rcpp::sourceCpp() to compile it, make the functions available to R, and demonstrate calling them with a couple of different seeds:
Rcpp::sourceCpp("so-answer.cpp")
set.seed(1)
arma <- index_min_test()
set.seed(1)
wrap <- which_min_test()
arma$Q[2, , 2]
#> [1] 0.2059746 0.3841037 0.7176185
wrap$Q[2, , 2]
#> [1] 0.2059746 0.3841037 0.7176185
arma$whichmin
#> [1] 0
wrap$whichmin
#> [1] 0
set.seed(2)
arma <- index_min_test()
set.seed(2)
wrap <- which_min_test()
arma$Q[2, , 2]
#> [1] 0.5526741 0.1808201 0.9763985
wrap$Q[2, , 2]
#> [1] 0.5526741 0.1808201 0.9763985
arma$whichmin
#> [1] 1
wrap$whichmin
#> [1] 1
library(microbenchmark)
microbenchmark(arma = index_min_test(), wrap = which_min_test())
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> arma 12.981 13.7105 15.09386 14.1970 14.9920 62.907 100 a
#> wrap 13.636 14.3490 15.66753 14.7405 15.5415 64.189 100 a
Created on 2018-12-21 by the reprex package (v0.2.1)

R match between two comma-separated strings

I am trying to find an elegant way to find matches between the two following character columns in a data frame. The complicated part is that either string can contain a comma-separated list, and if a member of one list is a match for any member of the other list, then that whole entry would be considered a match. I'm not sure how well I've explained this, so here's sample data and output:
Alt1:
AT
A
G
CGTCC,AT
CGC
Alt2:
AA
A
GG
AT,GGT
CG
Expected Match per row:
Row 1 = none
Row 2 = A
Row 3 = none
Row 4 = AT
Row 5 = none
Non-working solutions:
First attempt: merge entire data frames by desired columns, then match up the alt columns shown above:
match1 = data.frame(merge(vcf.df, ref.df, by=c("chr", "start", "end", "ref")))
matches = unique(match1[unlist(sapply(match1$Alt1 grep, match1$Alt2, fixed=TRUE)),])
Second method, using findoverlaps feature from VariantAnnoatation/Granges:
findoverlaps(ranges(vcf1), ranges(vcf2))
Any suggestions would be greatly appreciated! Thank you!
Solution
Thanks to #Marat Talipov's answer below, the following solution works to compare two comma-separated strings:
> ##read in edited kaviar vcf and human ref
> ref <- readVcfAsVRanges("ref.vcf.gz", humie_ref)
Warning message:
In .vcf_usertag(map, tag, ...) :
ScanVcfParam ‘geno’ fields not present: ‘AD’
> ##rename chromosomes to match with vcf files
> ref <- renameSeqlevels(ref, c("1"="chr1"))
> ##################################
> ## Gather VCF files to process ##
> ##################################
> ##data frame *.vcf.gz files in directory path
> vcf_path <- data.frame(path=list.files(vcf_dir, pattern="*.vcf.gz$", full=TRUE))
> ##read in everything but sample data for speediness
> vcf_param = ScanVcfParam(samples=NA)
> vcf <- readVcfAsVRanges("test.vcf.gz", humie_ref, param=vcf_param)
> #################
> ## Match SNP's ##
> #################
> ##create data frames of info to match on
> vcf.df = data.frame(chr =as.character(seqnames(vcf)), start = start(vcf), end = end(vcf), ref = as.character(ref(vcf)),
+ alt=alt(vcf), stringsAsFactors=FALSE)
> ref.df = data.frame(chr =as.character(seqnames(ref)), start = start(ref), end = end(ref),
+ ref = as.character(ref(ref)), alt=alt(ref), stringsAsFactors=FALSE)
>
> ##merge based on all positional fields except vcf
> col_match = data.frame(merge(vcf.df, ref.df, by=c("chr", "start", "end", "ref")))
> library(stringi)
> ##split each alt column by comma and bind together
> M1 <- stri_list2matrix(sapply(col_match$alt.x,strsplit,','))
> M2 <- stri_list2matrix(sapply(col_match$alt.y,strsplit,','))
> M <- rbind(M1,M2)
> ##compare results
> result <- apply(M,2,function(z) unique(na.omit(z[duplicated(z)])))
> ##add results column to col_match df for checking/subsetting
> col_match$match = result
> head(col_match)
chr start end ref alt.x alt.y match
1 chr1 39998059 39998059 A G G G
2 chr1 39998059 39998059 A G G G
3 chr1 39998084 39998084 C A A A
4 chr1 39998084 39998084 C A A A
5 chr1 39998085 39998085 G A A A
6 chr1 39998085 39998085 G A A A
In the case that input lists are of equal length and you'd like to compare list elements in the pairwise manner, you could use this solution:
library(stringi)
M1 <- stri_list2matrix(sapply(Alt1,strsplit,','))
M2 <- stri_list2matrix(sapply(Alt2,strsplit,','))
M <- rbind(M1,M2)
result <- apply(M,2,function(z) unique(na.omit(z[duplicated(z)])))
Sample input:
Alt1 <- list('AT','A','G','CGTCC,AT','CGC','GG,CC')
Alt2 <- list('AA','A','GG','AT,GGT','CG','GG,CC')
Output:
# [[1]]
# character(0)
#
# [[2]]
# [1] "A"
#
# [[3]]
# character(0)
#
# [[4]]
# [1] "AT"
#
# [[5]]
# character(0)
#
# [[6]]
# [1] "GG" "CC"
Sticking with the stringi package, you could do something like this, using the Alt1 and Alt2 data from Marat's answer.
library(stringi)
f <- function(x, y) {
ssf <- stri_split_fixed(c(x, y), ",", simplify = TRUE)
if(any(sd <- stri_duplicated(ssf))) ssf[sd] else NA_character_
}
Map(f, Alt1, Alt2)
# [[1]]
# [1] NA
#
# [[2]]
# [1] "A"
#
# [[3]]
# [1] NA
#
# [[4]]
# [1] "AT"
#
# [[5]]
# [1] NA
#
# [[6]]
# [1] "GG" "CC"
Or in base R, we can use scan() to separate the strings with commas.
g <- function(x, y, sep = ",") {
s <- scan(text = c(x, y), what = "", sep = sep, quiet = TRUE)
s[duplicated(s)]
}
Map(g, Alt1, Alt2)
you could do something like this:
Alt1 <- list('AT','A','G',c('CGTCC','AT'),'CGC')
Alt2 <- list('AA','A','GG',c('AT','GGT'),'CG')
# make sure you change the lists within in the lists into vectors
matchlist <- list()
for (i in 1:length(Alt1)){
matchlist[[i]] <- ifelse(Alt1[[i]] %in% Alt2[[i]],
paste("Row",i,"=",c(Alt1[[i]],Alt2[[i]])[duplicated(c(Alt1[[i]],Alt2[[i]]))],sep=" "),
paste("Row",i,"= none",sep=" "))
}
print(matchlist)

How to properly manipulate a string column in a data frame in R?

I have a data.frame with a string column that contains periods e.g "a.b.c.X". I want to split out the string by periods and retain the third segment e.g. "c" in the example given. Here is what I'm doing.
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> df
v b
1 a.b.a.X 1
2 a.b.b.X 2
3 a.b.c.X 3
And what I want is
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> df
v b
1 a 1
2 b 2
3 c 3
I'm attempting to use within, but I'm getting strange results. The value in the first row in the first column is being repeated.
> get = function(x) { unlist(strsplit(x, "\\."))[3] }
> within(df, v <- get(as.character(v)))
v b
1 a 1
2 a 2
3 a 3
What is the best practice for doing this? What am I doing wrong?
Update:
Here is the solution I used from #agstudy's answer:
> df = data.frame(v=c("a.b.a.X", "a.b.b.X", "a.b.c.X"), b=seq(1,3))
> get = function(x) gsub(".*?[.].*?[.](.*?)[.].*", '\\1', x)
> within(df, v <- get(v))
v b
1 a 1
2 b 2
3 c 3
Using some regular expression you can do :
gsub(".*?[.].*?[.](.*?)[.].*", '\\1', df$v)
[1] "a" "b" "c"
Or more concise:
gsub("(.*?[.]){2}(.*?)[.].*", '\\2', v)
The problem is not with within but with your get function. It returns a single character ("a") which gets recycled when added to your data.frame. Your code should look like this:
get.third <- function(x) sapply(strsplit(x, "\\."), `[[`, 3)
within(df, v <- get.third(as.character(v)))
Here is one possible solution:
df[, "v"] <- do.call(rbind, strsplit(as.character(df[, "v"]), "\\."))[, 3]
## > df
## v b
## 1 a 1
## 2 b 2
## 3 c 3
The answer to "what am I doing wrong" is that the bit of code that you thought was extracting the third element of each split string was actually putting all the elements of all your strings in a single vector, and then returning the third element of that:
get = function(x) {
splits = strsplit(x, "\\.")
print("All the elements: ")
print(unlist(splits))
print("The third element:")
print(unlist(splits)[3])
# What you actually wanted:
third_chars = sapply(splits, function (x) x[3])
}
within(df, v2 <- get(as.character(v)))

how to combine vectors with different length within a list in R?

I have a problem when combining the following vectors included in the list:
x <- list(as.numeric(c(1,4)),as.numeric(c(3,19,11)))
names (x[[1]]) <- c("species.A","species.C")
names (x[[2]]) <- c("species.A","species.B","species.C")
which gives the following list:
>x
>[[1]]
>species.A species.C
> 1 4
>[[2]]
>species.A species.B species.C
> 3 19 11
combining them using the do.call function:
y<- do.call(cbind,x)
gives:
>y
> [,1] [,2]
> species.A 1 3
> species.B 4 19
> species.C 1 11
while I would like to obtain this:
> [,1] [,2]
> species.A 1 3
> species.B NA 19
> species.C 4 11
You need to give R a bit more help, by first preparing the particular vectors, all of the same length, that you eventually want to cbind together. Otherwise (as you've seen) R uses its usual recycling rules to fill out the matrix.
Try something like this:
spp <- paste("species", c("A", "B", "C"), sep=".")
x2 <- lapply(x, FUN=function(X) X[spp])
mat <- do.call("cbind", x2)
row.names(mat) <- spp
mat
[,1] [,2]
species.A 1 3
species.B NA 19
species.C 4 11
EDIT: As Brian mentions in comments, this could be made a bit more compact (but at the expense of some readability). Which one you use is just a matter of taste:
mat <- do.call("cbind", lapply(x, "[", spp))
row.names(mat) <- spp
It looks like you're actually trying to do a merge. As such, merge will work. You just have to tell it to merge on the names, and to keep all rows.
do.call(merge, c(x, by=0, all=TRUE)) # by=0 and by="row.names" are the same
(This will create a data frame rather than a matrix, but for most purposes that shouldn't be an issue.)
merge(x = x[[1]], y = x[[2]], by = "names", all.y = TRUE)

R divide 2 list objects which each contain the same size xts objects

I have 2 lists whose components are xts objects (co and oc). I want to produce another list object that has the result of oc / co.
> length(co)
[1] 1064
> length(oc)
[1] 1064
> tail(co[[1]])
[,1]
2011-12-22 0.3018297
2011-12-23 0.2987450
2011-12-27 0.2699710
2011-12-28 0.2706428
2011-12-29 0.2098897
2011-12-30 0.2089051
> tail(oc[[1]])
[,1]
2011-12-22 0.6426411
2011-12-23 0.6462834
2011-12-27 0.6466680
2011-12-28 0.6741420
2011-12-29 0.6781371
2011-12-30 0.6650130
> co / oc
Error in co/oc : non-numeric argument to binary operator
If I specify an index of the lists the operation succeeds as follows:
> tail(co[[1]] / oc[[1]])
[,1]
2011-12-22 0.4696707
2011-12-23 0.4622507
2011-12-27 0.4174800
2011-12-28 0.4014627
2011-12-29 0.3095093
2011-12-30 0.3141369
I want to do this without writing a loop to iterate through each component of the two lists (1064 components in total).
Any help would be greatly appreciated. Thank you.
Something like this may work:
mapply("/",co,oc,SIMPLIFY = FALSE)
although there are probably countless ways of doing this that are all mostly equivalent.
Here's a minimal example using some sample data from the xts package:
data(sample_matrix)
sample.xts <- as.xts(sample_matrix, descr='my new xts object')
v1 <- list(a = sample.xts[,1],b = sample.xts[,2])
v2 <- list(a = sample.xts[,3],b = sample.xts[,4])
mapply("/",v1,v2,SIMPLIFY = FALSE)
Update:
We can now use Map which is basically the mapply(..., simplify = FALSE) by default.
Map("/",co,oc)