I have started with Rcpp and I am working through Hadley's book / page here.
I guess these basics are more than enough for me, still though I missed, some aspect or feel that this might be less basic:
How can I assign attributes to an arbitrary R Object using C++?
E.g.:
// [[Rcpp::export]]
NumericVector attribs(CharacterVector x,NumericVector y) {
NumericVector out = y;
out.attr("my-attr") = x;
return out;
}
I understand I have to specify the type in C++, but still I wonder whether there's a way to assign an attribute to ANY R object that I pass...
I have seen that settatr in the data.table works with C++, but seems to work only with elements of class data.table. Is there any way but writing an extra function for every R mode / class?
EDIT: The ultimate purpose is to speed up assigning attributes to each element of a list.
We had discussion here previously – but it did not involve Rcpp so far (except for using it via other packages.)
Maybe you want something like this? RObject is the generic class for all R objects. Note the use of clone so that you don't accidentally modify the object passed in.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector set_attr(CharacterVector x, RObject y) {
CharacterVector new_x = clone(x);
new_x.attr("my-attr") = y;
return new_x;
}
/*** R
x <- c("a", "b", "c")
set_attr(x, 1)
set_attr(x, "a")
attributes(x)
*/
Pardon my enthusiam: It's simply amazing how Rcpp helps an absolute novice to speed up code like that!
That's why I gave it a try though Hadley's answer perfectly covers the question. I tried to turn the input given here into a solution for the more specific case of adding attributes to a list of objects as fast as possible.
Even though my code is probably far from perfect I was already able to outperform all
functions suggested in the discussion, including data.table's setattr. I guess this is probably due to the fact that I let C++ not only to do the assignment but the looping as well.
Here's the example and benchmark:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
RObject fx(List x, CharacterVector y){
int n = x.size();
NumericVector new_el;
for(int i=0; i<n; i++) {
new_el = x[i];
new_el.attr("testkey") = y;
x[i] = new_el;
}
return(x);
}
Related
I need to arrange a dataframe of prices, row by row in ascedent order.
But doing it on R for Loop is quite bad and slow.
A friend of mine tipped me to use Rcpp.
But I'm having quite a hard time to develop a looping in C++ that works.
#include <Rcpp.h>
// [[Rcpp::export]]
using namespace std;
List min(NumericVector x)
{
for (unsigned int i = 0; i < x.size(); i++) {
vector<int>& vec = x[i];
NumericVector Value sort(vec.begin(), vec.end());
}
Return Value;
}
I'm not used to C++ and i would like to know why it keeps saying that mys sort is wrong.
Arrange my dataframe by row.
Welcome (again) to StackOverflow and Rcpp! Two big worlds with much to discover...
sort() is available as a member function:
> Rcpp::cppFunction("NumericVector srt(NumericVector x) { return(x.sort()); }")
> srt(c(2,3,4,1.5,3.2))
[1] 1.5 2.0 3.0 3.2 4.0
>
Note that an advanced question is hidden inside this simple because the sort() member function sorts in place so the above mutates its input. That can be convenient ("hey, no new heap object to return") or confusing depending on your vantage point. We cover it in most Rcpp tutorials but you may have other more pressing issue. Keep on it!
I'm using Stroustrup's matrix.h implementation as I have a lot of matrix heavy computation to do. It will make life easier if I can just get the matrix populated!
I'm receiving a complex object with a matrix that is not known until received. Once it enters the method, I can get the row and column count, but I have to use a double i,j loop to pull the values since they are in a cpp17::any structure and I have to convert them using asNumber().
I declare it as follows as part of an object definition:
Matrix<double,2> inputLayer;
In the code that instantiates the object, I have the following code:
int numRows = sourceSeries->rowCount();
int numColumns = sourceSeries->columnCount();
int i,j = 0;
for(i=0; i<numRows; i++){
for(j=0;j<numColumns;j++) {
// make sure you skip the header row in sourceSeries
inputLayer[i][j] = asNumber(sourceSeries->data(i+1,j,ItemDataRole::Display));
}
}
There is nothing like push_back() for the matrix template. The examples I can find in his books and on the web either pre-populate the matrix in the definition or create it from existing lists, which I won't have at this particular time.
Do I need to define a "new" double, receive the asNumber(), then set inputlayer[][] = the "new" double?
I'm hoping not to have to manage the memory like I can do with vectors that release when I go out of scope, which is why I was avoiding "new."
I'm using the boost frameworks as well and I'm wondering if I should try ublas version instead, or just get this one working.
Thanks for the pointers to Eigen, that was so simple! Here's all I had to do:
In the header file:
#include "Eigen/Dense"
using namespace Eigen;
In the object definition of the header file:
Matrix<double, Dynamic, Dynamic> inputLayer;
In the code where I need to read in the matrix:
int numRows = sourceSeries->rowCount();
int numColumns = sourceSeries->columnCount();
int i,j = 0;
MatrixXd inputLayer(numRows,numColumns);
for(i=0; i<numRows; i++){
for(j=0;j<numColumns;j++) {
// make sure you skip the header row in sourceSeries
inputLayer(i,j) = asNumber(sourceSeries->data(i+1,j,ItemDataRole::Display));
}
}
Sorry I had to waste so much time trying to get the other code to work, but at least I got real familiar with my debugger and the codebase again. Thanks everyone for the comments!
I have a question on why matrix multiplication is %*% in R but just * in C++.
Example:
in R script:
FunR <- function(mX, mY) {
mZ = mX %*% mY
mZInv = solve(mZ)
return(mZInv)
}
in C++ script:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat FunC(mat mX, mat mY) {
mat mZ = mX * mY;
mat mZInv = mZ.i();
return mZInv;
}
I ask because C++ can be easily incorporated into R documents.
Also, the "*" character is used to multiply matrices in R but it is not the standard matrix product as we know it. How are you supposed to know this stuff?
R and C++ are different languages. There is no reason to expect them to share syntax. You should be more surprised when the syntax matches than when it differs.
That being said, when you have a package, like Rcpp, that integrates languages, there usually is some attempt to make the syntax consistent. So why not use the same operator as R in this case? Because it is not possible. The list of operators in C++ is fixed, and %*% is not on that list. The operator * is on the list, though, so that operator could be chosen. Always better to choose something that can be chosen than to have nothing work. :)
(In case it got missed along the way: C++ has no native support for matrix operations. There is no matrix multiplication "in C++", only in specific libraries, such as Armadillo.)
I am pretty competent with Python, but I'm pretty new to C++ and things like pointers. I try to write some codes for solving ODE with the Eigen package for linear algebra (I will need to deal with lots of matrices later, so I plan to start with it). I have the following code for RK4 and they work:
#include "../eigen-eigen-b3f3d4950030/Eigen/Dense"
using namespace Eigen;
VectorXd Func(const VectorXd& a)
{ // equations for solving simple harmonic oscillator
Vector2d ans;
ans(0) = a(1); // dy/dt
ans(1) = -a(0); // d2y/dt2
return ans;
}
MatrixXd RK4(VectorXd Func(const VectorXd& y), const Ref<const VectorXd>& y0, double h, int step_num)
{
MatrixXd y(step_num, y0.rows());
y.row(0) = y0;
for (int i=1; i<step_num; i++){
VectorXd y_old = y.row(i-1).transpose();
VectorXd k1 = h*Func(y_old);
VectorXd k2 = h*Func(y_old+k1/2);
VectorXd k3 = h*Func(y_old+k2/2);
VectorXd k4 = h*Func(y_old+k3);
VectorXd dy = (k1 + 2*k2 + 2*k3 + k4)/6;
y.row(i) = y.row(i-1) + dy.transpose();
}
return y;
}
int main()
{
Vector2d v1;
v1(0) = 1.4; v1(1) = -0.1;
double h = 0.1;
int step_num = 50;
MatrixXd sol = RK4(Func,v1,h,step_num);
return 0;
}
I have the following questions:
What's the meaning of & in the function argument? Pass by reference? I just copied the code from the official documentation, but I'm not too sure if I understand every bit in the function arguments of RK4 such as VectorXd Func(const VectorXd& y). Are there alternative ways of accepting Eigen::MatrixXd and functions which accept Eigen::MatrixXd as function arguments?
From what I understand, we cannot return a whole 2D array from a function, and what we are returning is just the first element of the array (correct me if I'm wrong). What about the Eigen::MatrixX? What are we actually passing/returning? The first element of the matrix, or a completely new object defined by the Eigen library?
I'm not sure if these codes are written efficiently. Are there anything I can do to optimize this part? (Just wondering if I have done anything that may significantly slow down the speed).
Thanks
Yes, & is pass-by-reference; The latter one is syntax for passing a function, that takes a vector by reference and returns a vector. An Eigen::Matrix should always be passed by reference. There are tons of ways to pass one function to another, the most idiomatic ones in C++ are probably template arguments and std::function.
You can't have multiple return arguments, but you can return a pair or a tuple or a Matrix object. RK4 returns a whole matrix.
The code is fairly efficient. If it was really performance-critical there might be a few things that could be optimized, but I would not worry for now.
The biggest point is that RK4 is very general and works with dynamically sized types, which are a lot more expensive than their statically sized counter parts (VectorXf vs Vector2d). But this would require you to create a specialized version for all dimensions you are interested in or to get the compiler to do it for you by using templates.
Generally: Read a good book to get you started.
I mainly use R, but eventually would like to use Rcpp to interface with some C++ functions that take in and return 2d numeric arrays. So to start out playing around with C++ and Rcpp, I thought I'd just make a little function that converts my R list of variable-length numeric vectors to the C++ equivalent and back again.
require(inline)
require(Rcpp)
test1 = cxxfunction(signature(x='List'), body =
'
using namespace std;
List xlist(x);
int xlen = xlist.size();
vector< vector<int> > xx;
for(int i=0; i<xlen; i++) {
vector<int> test = as<vector<int> > (xlist[i]);
xx.push_back(test);
}
return(wrap(xx));
'
, plugin='Rcpp')
This works like I expect:
> test1(list(1:2, 4:6))
[[1]]
[1] 1 2
[[2]]
[1] 4 5 6
Admittedly I am only part way through the very thorough documentation, but is there a nicer (i.e. more Rcpp-like) way to do the R -> C++ conversion than with the for loop? I am thinking possibly not, since the documentation mentions that (at least with the built-in methods) as "offers less flexibility and currently handles conversion of R objects into primitive types", but I wanted to check because I'm very much a novice in this area.
I will give you bonus points for a reproducible example, and of course for using Rcpp :) And then I will take those away for not asking on the rcpp-devel list...
As for converting STL types: you don't have to, but when you decide to do it, the as<>() idiom is correct. The only 'better way' I can think of is to do name lookup as you would in R itself:
require(inline)
require(Rcpp)
set.seed(42)
xl <- list(U=runif(4), N=rnorm(4), T2df=rt(4,2))
fun <- cxxfunction(signature(x="list"), plugin="Rcpp", body = '
Rcpp::List xl(x);
std::vector<double> u = Rcpp::as<std::vector<double> >(xl["U"]);
std::vector<double> n = Rcpp::as<std::vector<double> >(xl["N"]);
std::vector<double> t2 = Rcpp::as<std::vector<double> >(xl["T2df"]);
// do something clever here
return(R_NilValue);
')
Hope that helps. Otherwise, the list is always open...
PS As for the two-dim array, that is trickier as there is no native C++ two-dim array. If you actually want to do linear algebra, look at RcppArmadillo and RcppEigen.