I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.
Related
I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.
This is what I'm doing now
library(Rcpp)
A <- diag(c(1.0, 2.0, 3.0))
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
cppFunction('
void scaleMatrix(NumericMatrix& A, double x) {
A = A * x;
}')
Unfortunately It doesn't work :(
> A
A B C
X 1 0 0
Y 0 2 0
Z 0 0 3
> scaleMatrix(A, 2)
> A
A B C
X 1 0 0
Y 0 2 0
Z 0 0 3
I learned from Rcpp FAQ, Question 5.1 that Rcpp should be able to change the object I passed by value. Stealing an example from Dirk's answer to my previous question:
> library(Rcpp)
> cppFunction("void inplaceMod(NumericVector x) { x = x * 2; }")
> x <- as.numeric(1:5)
> inplaceMod(x)
> x
[1] 2 4 6 8 10
I'm confused: it is possible to modify a NumericVector in-place, but not a NumericMatrix?
You can preserve the row and column names by using NumericVector instead of NumericMatrix, keeping in mind that a matrix in R is just a vector with attached dimensions. You can do this switch either when going from R to C++ (scaleVector below) or within C++ (scaleMatrix below taken from a now deleted answer by #Roland):
library(Rcpp)
cppFunction('
NumericVector scaleVector(NumericVector& A, double x) {
A = A * x;
return A;
}')
cppFunction('
NumericMatrix scaleMatrix(NumericMatrix& A, double x) {
NumericVector B = A;
B = B * x;
return A;
}')
If one applies these two function to your matrix, the row and column names are preserved. However, the matrix is not changed in place:
A <- diag(1:3)
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
scaleMatrix(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
scaleVector(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
A
#> A B C
#> X 1 0 0
#> Y 0 2 0
#> Z 0 0 3
The reason for that is that diag(1:3) is actually an integer matrix, so a copy is made when you transfer it to a numeric matrix (or vector):
is.integer(A)
#> [1] TRUE
If one uses a numeric matrix to begin with, modification is done in place:
A <- diag(c(1.0, 2.0, 3.0))
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
scaleMatrix(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
scaleVector(A, 2)
#> A B C
#> X 4 0 0
#> Y 0 8 0
#> Z 0 0 12
A
#> A B C
#> X 4 0 0
#> Y 0 8 0
#> Z 0 0 12
I am a beginner in R studio, so hopefully someone can help me with this problem. The case: I want to make an if else loop. I made the following code for an l times m matrix:
for (i in 1:l){
for (j in 1:m){
if (is.na(quantilereturns[i,j]) < quantile(quantilereturns[,j], c(.1), na.rm=TRUE)) {
quantilereturns[i,j]
} else { (0) }
}
}
Summary: I want to make a matrix with values that are smaller than the quantile of a certain vector in the matrix quantilereturns. So when they are smaller than the 10% quantile they get their original value otherwise it will be a zero.
The code doesn't give any errors, but it doesn't change the values in the matrix either.
Can someone help me?
You need to assign the result to a cell of the matrix. I will take the matrix of a recent other thread as an example:
a <- c(4, -9, 2)
b <- c(-1, 3, -8)
c <- c(5, 2, 6)
d <- c(7, 9, -2)
matrix <- cbind(a,b,c,d)
d <- dim(matrix)
rows <- d[1]
columns <- d[2]
print("Before")
print(matrix)
for (i in 1:rows) {
for (j in 1:columns) {
if (is.na(matrix[i,j]) >= quantile(matrix[,j], c(.1), na.rm=TRUE)) {
matrix[i,j] <- 0
}
}
}
print("After")
print(matrix)
this gives
[1] "Before"
a b c d
[1,] 4 -1 5 7
[2,] -9 3 2 9
[3,] 2 -8 6 -2
[1] "After"
a b c d
[1,] 0 0 5 0
[2,] 0 0 2 0
[3,] 0 0 6 0
So the essential line you are looking for is matrix[i,j] <- 0
I would like to collapse the rows of a transposed NumericMatrix using Rcpp. For instance:
library("data.table")
library("Rcpp")
dt1 <- data.table(V1=c(1, 0, 2),
V2=c(1, 1, 0),
V3=c(1, 0, 1),
V4=c(0, 1, 2),
V5=c(1, 1, 1))
cppFunction('NumericMatrix transpose(DataFrame data) {
NumericMatrix genotypes = internal::convert_using_rfunction(data, "as.matrix");
NumericMatrix tgeno(data.ncol(), data.nrow());
int number_samples = data.ncol();
int number_snps = data.nrow();
for (int i = 0; i < number_snps; i++) {
for (int j = 0; j < number_samples; j++) {
tgeno(j,i) = genotypes(i,j);
}
}
return tgeno;
}')
dt1
transpose(dt1)
Original Matrix
V1 V2 V3 V4 V5
1: 1 1 1 0 1
2: 0 1 0 1 1
3: 2 0 1 2 1
Transposed Matrix
[,1] [,2] [,3]
[1,] 1 0 2
[2,] 1 1 0
[3,] 1 0 1
[4,] 0 1 2
[5,] 1 1 1
I would like to have the following matrix:
[,1]
[1,] 102
[2,] 110
[3,] 101
[4,] 012
[5,] 111
Could anyone suggest a way to do this?
Maybe as a starting point, assuming that the numbers you concatenate consist only of a single digit:
//' #export
// [[Rcpp::export]]
std::vector<std::string> string_collapse(const Rcpp::DataFrame& data)
{
R_xlen_t nrow = data.nrow();
R_xlen_t ncol = data.ncol();
std::vector<std::string> ret(ncol);
for (R_xlen_t j = 0; j < ncol; ++j) {
const auto& col = Rcpp::as<Rcpp::NumericVector>(data[j]);
std::string ccstr;
ccstr.reserve(nrow);
for (const auto& chr: col) {
ccstr += std::to_string(chr)[0];
}
ret[j] = ccstr;
}
return ret;
}
It gives
dat <- data.frame(V1=c(1, 0, 2),
V2=c(1, 1, 0),
V3=c(1, 0, 1),
V4=c(0, 1, 2),
V5=c(1, 1, 1))
string_collapse(dat)
[1] "102" "110" "101" "012" "111"
But a quick benchmark comparing it to a pure R-solution suggests that you should not expect miracles. Probably there is still room for optimization.
Once you have transposed the matrix you can collapse the rows as follows:
matrix(apply(dt1, 1, paste0, collapse = ""), ncol = 1)
Given a matrix of size M and N, we want to fill in each row with integer value (>=0) so that it sums up to certain value.
Note that the dimension of M and N are pre-computed using certain formula, so that it is guaranteed to match the fill given the desired condition (i.e. sum_val below).
This is implemented in R under Partition library.
library(partitions)
# In this example, we impose condition
# that each rows must sum up to 2 in total
# And each row has 5 columns
sum_val <- 2
n <- 5
#The above two parameters are predefined.
t(as.matrix(compositions(sum_val, n)))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 0 0 0 0
[2,] 1 1 0 0 0
[3,] 0 2 0 0 0
[4,] 1 0 1 0 0
[5,] 0 1 1 0 0
[6,] 0 0 2 0 0
[7,] 1 0 0 1 0
[8,] 0 1 0 1 0
[9,] 0 0 1 1 0
[10,] 0 0 0 2 0
[11,] 1 0 0 0 1
[12,] 0 1 0 0 1
[13,] 0 0 1 0 1
[14,] 0 0 0 1 1
[15,] 0 0 0 0 2
Is there any existing implementation in C++?
Recursive version
Here is a recursive solution. You have a sequence a where you keep track of the numbers you already have set. Each recursive call will assign valid numbers to one of these elements in a loop, before recursively calling that function for the remainder of the list.
void recurse(std::vector<int>& a, int pos, int remaining) {
if (remaining == 0) { print(a); return; }
if (pos == a.size()) { return; }
for (int i = remaining; i >= 0; --i) {
a[pos] = i;
recurse(a, pos + 1, remaining - i);
}
}
void print_partitions(int sum_val, int n) {
std::vector<int> a(n);
recurse(a, 0, sum_val);
}
Proof of concept run visible at http://ideone.com/oJNvmu.
Iterative version
Your comment below indicates a performance problem. While it seems very likely that I/O is eating most of your performance, here is an iterative solution which avoids the function call overhead of the recursive approach.
void print_partitions(int sum_val, int n) {
int pos = 0, last = n - 1;
int a[n]; // dynamic stack-allocated arrays are a gcc extension
for (int i = 1; i != n; ++i)
a[i] = 0;
a[0] = sum_val;
while (true) {
for (int i = 0; i != last; ++i)
printf("%3d ", a[i]);
printf("%3d\n", a[last]);
if (pos != last) {
--a[pos];
++pos;
a[pos] = 1;
}
else {
if (a[last] == sum_val)
return;
for (--pos; a[pos] == 0; --pos);
--a[pos];
int tmp = 1 + a[last];
++pos;
a[last] = 0;
a[pos] = tmp;
}
}
}
The general idea and the order in which things are printed is the same as for the recursive approach. Instead of maintaining a counter remaining, all the tokens (or whatever it is you are partitioning) are immediately dropped in the place where they belong for the next partition to be printed. pos is always the last non-zero field. If that is not the last, then you obtain the next partition by taking one token from pos and moving it to the place after that. If it is the last, then you take all tokens from that last place, find the last non-zero place before that and take one token from there as well, then dump all these tokens onto the place after the one where you took the single token.
Demo run at http://ideone.com/N3lSbQ.
You can implement it yourself:
such a partition is defined by 6 integers 0 <= x[0] <= x[1] <= x[2] <= x[3] <= 2;
the values in the corresponding row are just the differences x[0]-0, x[1]-x[0], x[2]-x[1], etc.
If the number of columns (5) is fixed, you have 4 nested loops;
it it is not, you can formulate the problem recursively.