How to scale a NumericMatrix in-place with Rcpp? - c++

This is what I'm doing now
library(Rcpp)
A <- diag(c(1.0, 2.0, 3.0))
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
cppFunction('
void scaleMatrix(NumericMatrix& A, double x) {
A = A * x;
}')
Unfortunately It doesn't work :(
> A
A B C
X 1 0 0
Y 0 2 0
Z 0 0 3
> scaleMatrix(A, 2)
> A
A B C
X 1 0 0
Y 0 2 0
Z 0 0 3
I learned from Rcpp FAQ, Question 5.1 that Rcpp should be able to change the object I passed by value. Stealing an example from Dirk's answer to my previous question:
> library(Rcpp)
> cppFunction("void inplaceMod(NumericVector x) { x = x * 2; }")
> x <- as.numeric(1:5)
> inplaceMod(x)
> x
[1] 2 4 6 8 10
I'm confused: it is possible to modify a NumericVector in-place, but not a NumericMatrix?

You can preserve the row and column names by using NumericVector instead of NumericMatrix, keeping in mind that a matrix in R is just a vector with attached dimensions. You can do this switch either when going from R to C++ (scaleVector below) or within C++ (scaleMatrix below taken from a now deleted answer by #Roland):
library(Rcpp)
cppFunction('
NumericVector scaleVector(NumericVector& A, double x) {
A = A * x;
return A;
}')
cppFunction('
NumericMatrix scaleMatrix(NumericMatrix& A, double x) {
NumericVector B = A;
B = B * x;
return A;
}')
If one applies these two function to your matrix, the row and column names are preserved. However, the matrix is not changed in place:
A <- diag(1:3)
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
scaleMatrix(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
scaleVector(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
A
#> A B C
#> X 1 0 0
#> Y 0 2 0
#> Z 0 0 3
The reason for that is that diag(1:3) is actually an integer matrix, so a copy is made when you transfer it to a numeric matrix (or vector):
is.integer(A)
#> [1] TRUE
If one uses a numeric matrix to begin with, modification is done in place:
A <- diag(c(1.0, 2.0, 3.0))
rownames(A) <- c('X', 'Y', 'Z')
colnames(A) <- c('A', 'B', 'C')
scaleMatrix(A, 2)
#> A B C
#> X 2 0 0
#> Y 0 4 0
#> Z 0 0 6
scaleVector(A, 2)
#> A B C
#> X 4 0 0
#> Y 0 8 0
#> Z 0 0 12
A
#> A B C
#> X 4 0 0
#> Y 0 8 0
#> Z 0 0 12

Related

Is there and R fucntion to mutate a variable on multiple columns conditions using If else or

I am trying to create a variable in my data based on following conditions:
x y Z S T G
1 0 1 0 1 0
1 0 0 0 0 0
1 1 1 0 0 0
1 1 1 1 1 1
if x=1 then 1,
if y=1 then 2 if s=1 then 3,
if t=1 then 4 if G=1 then 5 if X==y==z==1 then 6 and so on.
Please tell me how can i write this using if else
Using if else?
You can calculate it without if else:
v <- 1:6
# this vector should give each column a the value
# 1 2 3 ... 6
# the most tedious part is to get your notes into a the R terminal
# as an R matrix.
# I used the fact that the string in R can span multiple lines:
s <- "x y Z S T G
1 0 1 0 1 0
1 0 0 0 0 0
1 1 1 0 0 0
1 1 1 1 1 1"
# it looks like this:
s
## [1] "x y Z S T G\n1 0 1 0 1 0\n1 0 0 0 0 0 \n1 1 1 0 0 0 \n1 1 1 1 1 1"
# after trying long around with the base R functions
# which led to errors and diverse problems, I found the most elegant way
# to transform this string into a matrix-like tabular form
# is to use tidyverse's read_delim().
# install.packages("tidyverse")
# load tidyverse:
require(tidyverse) # or: library(tidyverse)
tb <- read_delim(s, delim=" ") ## it complains about parsing failues, but
tb
# A tibble: 4 x 6
x y Z S T G
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 0 1 0
2 1 0 0 0 0 0
3 1 1 1 0 0 0
4 1 1 1 1 1 1
# so it is read correctly in!
# what you want to do actually is
# to multiply each row with `v` and sum this result:
tb[1, ]
# A tibble: 1 x 6
x y Z S T G
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 1 0 1 0
# you do:
v * tb[1, ]
x y Z S T G
1 1 0 3 0 5 0
# if you build sum with this, then you get your desired numbers
sum(v * tb[1, ])
## [1] 9
# row-wise manipulation of matrix/data.frame/tibbles you do by
apply(tb, MARGIN=1, FUN=function(row) v * row)
[,1] [,2] [,3] [,4]
x 1 1 1 1
y 0 0 2 2
Z 3 0 3 3
S 0 0 0 4
T 5 0 0 5
G 0 0 0 6
# very often such functions flip the results, so flip it back
# by the transpose function `t()`:
t(apply(tb, MARGIN=1, FUN=function(row) v * row))
x y Z S T G
[1,] 1 0 3 0 5 0
[2,] 1 0 0 0 0 0
[3,] 1 2 3 0 0 0
[4,] 1 2 3 4 5 6
# to get directly the sum by row, do:
apply(tb, MARGIN=1, FUN=function(row) sum(v * row))
## [1] 9 1 6 21
# these are the values you wanted, isn't it?
# I see now, that
tb * v # by using vectorization of R
x y Z S T G
[1,] 1 0 3 0 5 0
[2,] 1 0 0 0 0 0
[3,] 1 2 3 0 0 0
[4,] 1 2 3 4 5 6
# therfore the rowSums are:
rowSums(tb * v)
## [1] 9 1 6 21
So this is the usual (messy) way how one often gets to the solution.
At the end, it boils down to this (and usually you find in Stack Overflow short answers like):
Short answer
require(tidyverse)
s <- "x y Z S T G
1 0 1 0 1 0
1 0 0 0 0 0
1 1 1 0 0 0
1 1 1 1 1 1"
tb <- read_delim(s, delim=" ")
rowSums(tb * v)
And this is the beauty of R: If you know exactly what to do, it is just 1-3 lines of code (or a little more) ...

Strange behavior of C++/Rcpp::NumericVector: Updating values of a vector in C++ causes updates in R in different vector [duplicate]

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.

if else loop not working

I am a beginner in R studio, so hopefully someone can help me with this problem. The case: I want to make an if else loop. I made the following code for an l times m matrix:
for (i in 1:l){
for (j in 1:m){
if (is.na(quantilereturns[i,j]) < quantile(quantilereturns[,j], c(.1), na.rm=TRUE)) {
quantilereturns[i,j]
} else { (0) }
}
}
Summary: I want to make a matrix with values that are smaller than the quantile of a certain vector in the matrix quantilereturns. So when they are smaller than the 10% quantile they get their original value otherwise it will be a zero.
The code doesn't give any errors, but it doesn't change the values in the matrix either.
Can someone help me?
You need to assign the result to a cell of the matrix. I will take the matrix of a recent other thread as an example:
a <- c(4, -9, 2)
b <- c(-1, 3, -8)
c <- c(5, 2, 6)
d <- c(7, 9, -2)
matrix <- cbind(a,b,c,d)
d <- dim(matrix)
rows <- d[1]
columns <- d[2]
print("Before")
print(matrix)
for (i in 1:rows) {
for (j in 1:columns) {
if (is.na(matrix[i,j]) >= quantile(matrix[,j], c(.1), na.rm=TRUE)) {
matrix[i,j] <- 0
}
}
}
print("After")
print(matrix)
this gives
[1] "Before"
a b c d
[1,] 4 -1 5 7
[2,] -9 3 2 9
[3,] 2 -8 6 -2
[1] "After"
a b c d
[1,] 0 0 5 0
[2,] 0 0 2 0
[3,] 0 0 6 0
So the essential line you are looking for is matrix[i,j] <- 0

A many-to-one mapping in the natural domain using discrete input variables?

I would like to find a mapping f:X --> N, with multiple discrete natural variables X of varying dimension, where f produces a unique number between 0 to the multiplication of all dimensions. For example. Assume X = {a,b,c}, with dimensions |a| = 2, |b| = 3, |c| = 2. f should produce 0 to 12 (2*3*2).
a b c | f(X)
0 0 0 | 0
0 0 1 | 1
0 1 0 | 2
0 1 1 | 3
0 2 0 | 4
0 2 1 | 5
1 0 0 | 6
1 0 1 | 7
1 1 0 | 8
1 1 1 | 9
1 2 0 | 10
1 2 1 | 11
This is easy when all dimensions are equal. Assume binary for example:
f(a=1,b=0,c=1) = 1*2^2 + 0*2^1 + 1*2^0 = 5
Using this naively with varying dimensions we would get overlapping values:
f(a=0,b=1,c=1) = 0*2^2 + 1*3^1 + 1*2^2 = 4
f(a=1,b=0,c=0) = 1*2^2 + 0*3^1 + 0*2^2 = 4
A computationally fast function is preferred as I intend to use/implement it in C++. Any help is appreciated!
Ok, the most important part here is math and algorythmics. You have variable dimensions of size (from least order to most one) d0, d1, ... ,dn. A tuple (x0, x1, ... , xn) with xi < di will represent the following number: x0 + d0 * x1 + ... + d0 * d1 * ... * dn-1 * xn
In pseudo-code, I would write:
result = 0
loop for i=n to 0 step -1
result = result * d[i] + x[i]
To implement it in C++, my advice would be to create a class where the constructor would take the number of dimensions and the dimensions itself (or simply a vector<int> containing the dimensions), and a method that would accept an array or a vector of same size containing the values. Optionaly, you could control that no input value is greater than its dimension.
A possible C++ implementation could be:
class F {
vector<int> dims;
public:
F(vector<int> d) : dims(d) {}
int to_int(vector<int> x) {
if (x.size() != dims.size()) {
throw std::invalid_argument("Wrong size");
}
int result = 0;
for (int i = dims.size() - 1; i >= 0; i--) {
if (x[i] >= dims[i]) {
throw std::invalid_argument("Value >= dimension");
}
result = result * dims[i] + x[i];
}
return result;
}
};

rcpp pass vector by reference [duplicate]

I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.