Rcpp, making functions available to R from DLL - c++

New to C++, I would like to make functions compiled in a DLL available in R.
So far, I managed to do that for a basic function taking integer as input and returning the same, by following these steps in VisualStudio, then using dyn.load from R to load the created dll.
However, my C++ functions will need to handle R data.frame objects, and I am not sure how to make that possible. I saw from the Rcpp gallery that Rcpp might include some kind of translations between R and c++ data types including data.frame objects, but I don't know if I can generate a dll using Rcpp that I can then include in R using dyn.load.
From this answer by Dirk Eddelbuettel, it seems possible to generate a "dynamic library" using Rcpp, however, I could not find any dll when I tried generating a package with a source .cpp file using rcpp.package.skeleton(). The function I'd like to have a dll for is from the Rcpp gallery
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
DataFrame modifyDataFrame(DataFrame df) {
// access the columns
IntegerVector a = df["a"];
CharacterVector b = df["b"];
// make some changes
a[2] = 42;
b[1] = "foo";
// return a new data frame
return DataFrame::create(_["a"]= a, _["b"]= b);
}
I tried to just paste that code into VisualStudio to try and generate that DLL, however I have the error "cannot find Rcpp.h" which I quite expected.
I then followed these steps in RStudio:
Create new project / Package
Include this cpp source file as a source file for this package
include Rcpp and enter Rcpp.package.skeleton("mypackage") so far, no DLL in the package folders
Tried to build the package in RStudio by going to Build/Install and Restart, but then I get an error message "Building R Packages needs installation of additional build tools, do you want to install them?" However I already have RbuildTools 3.4 installed, and when I click "YES" in RStudio nothing happens.
PS: Happy to hear about other methods but here the DLL format should be used if possible. Any piece of info is greatly appreciated since I have basically no idea of how Rcpp or C++ work

You need to figure out why your setup is hosed. This is meant to be easy and it is easy. Watch:
R> Rcpp::cppFunction('DataFrame modDF(DataFrame df) { IntegerVector a = df["a"]; CharacterVector b = df["b"]; a[2] = 42; b[1] = "foo"; return DataFrame::create(Named("a")=a, Named("b")=b); } ')
R> df <- data.frame(a=1:3, b=letters[24:26])
R> df
a b
1 1 x
2 2 y
3 3 z
R> modDF(df)
a b
1 1 x
2 2 foo
3 42 z
R>
Now, I obviously don't recommend writing this way in a long one-liner. You are already on the right track setting up a package. But you need to sort out what is holding up your tools.
And as a PS the one-liner above with better indentation:
R> Rcpp::cppFunction('DataFrame modDF(DataFrame df) { \
IntegerVector a = df["a"]; \
CharacterVector b = df["b"]; \
a[2] = 42; \
b[1] = "foo"; \
return DataFrame::create(Named("a")=a, Named("b")=b); \
} ')

The following seems to work:
from R, ran Rcpp.package.skeleton("dfcpp4", cpp_files="modifyDataFrame.cpp"). The second argument is required in order for the modifyDataFrame function to be available from the dll using dyn.load.
From the command line ran R CMD build dfcpp4
ran R CMD check dfcpp4 --no-manual from the command line.
The dll file in now present in the src-x64 folder
I am now able to call this function using
dyn.load("dfcpp4/src-x64/dfcpp4.dll")
df <- data.frame(a = c(1, 2, 3),
b = c("x", "y", "z"))
.Call("_dfcpp4_modifyDataFrame", df)
a b
1 1 x
2 2 foo
3 42 z
What I don't get is why in this case .Call should be used instead of .C...

Related

R package does not work properly because C++ source code (/src) is not compiled

I have been struggling with installing an R package containing some clustering algorithms from Github using this command:
require("devtools")
install_github("bfatimah/OASW")
(https://github.com/bfatimah/OASW/)
In order for the package to work properly for what I am doing, I have to install some other packages, for example.
require("cluster")
require("mclust")
require("nnet")
I did install these packages and loaded them into R. However, when I tried to run this block of codes:
n <- 100
K <- 2
dmat <- TwoGaussian(n)$data
dys <- dist(dmat)
initClustering <- init(dmat, K, distmethod = "euclidean")
osilClustering <- osilFix(dys, n, K, initClustering$lab_best)
plot(dmat, col = osilClustering$clus_lab, pch = 16, cex = 1.5)
It returned an error:
> osilClustering <- osilFix(dys, n, K, initClustering$lab_best)
Error in sil_lab_swap(K, n, clus_lab, alt_clus_lab, clus_size, disty, :
object '_oasw_sil_lab_swap' not found
Called from: sil_lab_swap(K, n, clus_lab, alt_clus_lab, clus_size, disty,
iter, dys_i, avg_dys_clus, silh, altsilh)
It turns out that there is a folder (/src) in the package containing C++ programs, and the "R" functions in the packages only act as wrapper of C++ functions in /src. None of the C++ programs seem to be compiled.
I just do not know how to fix this problem as it is not my expertise at all. Is there any advice? Thanks a lot!
P/s: It worked after restarting R and ran all the program again. It did not work previously because of confliction.

matlab script to c++

I am trying to implement this person's code: http://iris.lib.neu.edu/civil_eng_theses/30/ but I can not generate a working executable in visual studio because I am missing 2 files: myoutput.cpp and PrintTimeReferencedVariable.cpp. I am unclear if the Matlab builder is supposed to generate them or I write them.
The myoutput.cpp is supposed to be a C++ file that does essentially a dlmwrite. It is used as follows:
function [] = Output_01(FileName, FileNameLenght, OutputData, OutputSize1, OutputSize2)
coder.inline('never')
coder.ceval( 'myoutput', FileName, FileNameLenght, OutputData, OutputSize1, OutputSize2);
end
% %% Matlab Version: to run the model in Matlab uncomment below, comment above
% function [] = Output_01(FileName, FileNameLenght, OutputData, OutputSize1, OutputSize2)
% dlmwrite(FileName, OutputData,'-append','coffset',0,'delimiter','\t','precision','%6.6G')
% precision = 4;
% disp( num2str( OutputData,precision))
% end
Where the commented out is the matlab version of the file and the top is what is built into C++. The files are supposed to write to the directory: C:\matlab-results-from-cpp.
The PrintTimeReferencedVariable.cpp looks like a date function that should do:
% Date = datestr(now, 'yyyy-mm-dd-HH-MM-SS' );
%% Matlab Version: to run the model in Matlab uncomment above, comment below
Date = '2012-09-16-09-19-20';
coder.ceval('PrintTimeReferencedVariable', coder.wref(Date));
Any insight into this would be much help. I'm still waiting to hear back from the author but it would still be helpful to hear all of your input as this is my first time in C++ and building projects in Matlab.
Thanks!
Seems like the text is pretty clear on what you do, though I can understand the confusion not coming from a C++ background:
5)
“Project Settings ”adjustments: Open ‘More settings’,then in ‘All Settings’ tab, choose for ‘Language’ C ++; In “Speed” tab mark only “Saturate on integer overflow”, unmark everything else; in “Custom Code ” tab, copy and paste the first lines the functions “myoutput.cpp” and “PrintTimeReferencedVariable.cpp”, followed by semicolon or see below the text:
void PrintTimeReferencedVariable (char * DateOut);
void myoutput(const char* _filename, int fileNameLen, double* data, int n, int m);
So you create the files "myoutput.cpp" and "PrintTimeReferencedVariable.cpp". Copy and paste the code into the respective cpp files. Then add the prototype(the second quote) to your header or at the top of those files.

Running compiled C++ code with Rcpp

I have been working my way through Dirk Eddelbuettel's Rcpp tutorial here:
http://www.rinfinance.com/agenda/
I have learned how to save a C++ file in a directory and call it and run it from within R. The C++ file I am running is called 'logabs2.ccp' and its contents are directly from one of Dirk's slides:
#include <Rcpp.h>
using namespace Rcpp;
inline double f(double x) { return ::log(::fabs(x)); }
// [[Rcpp::export]]
std::vector<double> logabs2(std::vector<double> x) {
std::transform(x.begin(), x.end(), x.begin(), f);
return x;
}
I run it with this R code:
library(Rcpp)
sourceCpp("c:/users/mmiller21/simple r programs/logabs2.cpp")
logabs2(seq(-5, 5, by=2))
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
I am running the code on a Windows 7 machine from within the R GUI that seems to install by default. I also installed the most recent version of Rtools. The above R code seems to take a relatively long time to run. I suspect most of that time is devoted to compiling the C++ code and that once the C++ code is compiled it runs very quickly. Microbenchmark certainly suggests that Rcpp reduces computation time.
I have never used C++ until now, but I know that when I compile C code I get an *.exe file. I have searched my hard-drive from a file called logabs2.exe but cannot find one. I am wondering whether the above C++ code might run even faster if a logabs2.exe file was created. Is it possible to create a logabs2.exe file and store it in a folder somewhere and then have Rcpp call that file whenever I wanted to use it? I do not know whether that makes sense. If I could store a C++ function in an *.exe file then perhaps I would not have to compile the function every time I wanted to use it with Rcpp and then perhaps the Rcpp code would be even faster.
Sorry if this question does not make sense or is a duplicate. If it is possible to store the C++ function as an *.exe file I am hoping someone will show me how to modify my R code above to run it. Thank you for any help with this or for setting me straight on why what I suggest is not possible or recommended.
I look forward to seeing Dirk's new book.
Thank you to user1981275, Dirk Eddelbuettel and Romain Francois for their responses. Below is how I compiled a C++ file and created a *.dll, then called and used that *.dll file inside R.
Step 1. I created a new folder called 'c:\users\mmiller21\myrpackages' and pasted the file 'logabs2.cpp' into that new folder. The file 'logabs2.cpp' was created as described in my original post.
Step 2. Inside the new folder I created a new R package called 'logabs2' using an R file I wrote called 'new package creation.r'. The contents of 'new package creation.r' are:
setwd('c:/users/mmiller21/myrpackages/')
library(Rcpp)
Rcpp.package.skeleton("logabs2", example_code = FALSE, cpp_files = c("logabs2.cpp"))
I found the above syntax for Rcpp.package.skeleton on one of Hadley Wickham's websites: https://github.com/hadley/devtools/wiki/Rcpp
Step 3. I installed the new R package "logabs2" in R using the following line in the DOS command window:
C:\Program Files\R\R-3.0.1\bin\x64>R CMD INSTALL -l c:\users\mmiller21\documents\r\win-library\3.0\ c:\users\mmiller21\myrpackages\logabs2
where:
the location of the rcmd.exe file is:
C:\Program Files\R\R-3.0.1\bin\x64>
the location of installed R packages on my computer is:
c:\users\mmiller21\documents\r\win-library\3.0\
and the location of my new R package prior to being installed is:
c:\users\mmiller21\myrpackages\
Syntax used in the DOS command window was found by trial and error and may not be ideal. At some point I pasted a copy of 'logabs2.cpp' in 'C:\Program Files\R\R-3.0.1\bin\x64>' but I do not think that mattered.
Step 4. After installing the new R package I ran it using an R file I named 'new package usage.r' in the 'c:/users/mmiller21/myrpackages/' folder (although I do not think the folder was important). The contents of 'new package usage.r' are:
library(logabs2)
logabs2(seq(-5, 5, by=2))
The output was:
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
This file loaded the package Rcpp without me asking.
In this case base R was faster assuming I did this correctly.
#> microbenchmark(logabs2(seq(-5, 5, by=2)), times = 100)
#Unit: microseconds
# expr min lq median uq max neval
# logabs2(seq(-5, 5, by = 2)) 43.086 44.453 50.6075 69.756 190.803 100
#> microbenchmark(log(abs(seq(-5, 5, by=2))), times=100)
#Unit: microseconds
# expr min lq median uq max neval
# log(abs(seq(-5, 5, by = 2))) 38.298 38.982 39.666 40.35 173.023 100
However, using the dll file was faster than calling the external cpp file:
system.time(
cppFunction("
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
")
)
# user system elapsed
# 0.06 0.08 5.85
Although base R seems faster or as fast as the *.dll file in this case, I have no doubt that using the *.dll file with Rcpp will be faster than base R in most cases.
This was my first attempt creating an R package or using Rcpp and no doubt I did not use the most efficient methods. Also, I apologize for any typographic errors in this post.
EDIT
In a comment below I think Romain Francois suggested I modify the *.cpp file to the following:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
and recreate my R package, which I have now done. I then compared base R against my new package using the following code:
library(logabs)
logabs(seq(-5, 5, by=2))
log(abs(seq(-5, 5, by=2)))
library(microbenchmark)
microbenchmark(logabs(seq(-5, 5, by=2)), log(abs(seq(-5, 5, by=2))), times = 100000)
Base R is still a tiny bit faster or no different:
Unit: microseconds
expr min lq median uq max neval
logabs(seq(-5, 5, by = 2)) 42.401 45.137 46.505 69.073 39754.598 1e+05
log(abs(seq(-5, 5, by = 2))) 37.614 40.350 41.718 62.234 3422.133 1e+05
Perhaps this is because base R is already vectorized. I suspect with more complex functions base R will be much slower. Or perhaps I am still not using the most efficient approach, or perhaps I simply made an error somewhere.
You say
I have never used C++ until now, but I know that when I compile C code
I get an *.exe file
and that is true if and only you build an executable. Here, we build dynamically loadable libraries and those thend to have different extensionos depending on the operating system: .dll for Windoze, .so for Linux, .dynlib for OS X.
So nothing wrong here, you simply had the wrong assumption.
If you want to get some entity you can keep, what you are looking for is an R package. There are many resources online to learn how to make them (e.g. Hadley's slides).
We have Rcpp.package.skeleton you might find useful.
So, the function is compiled once when the package is installed, and then you just use it.

Boost spirit calculator example run

I am a Spirit beginner and studying it nowadays. I am at this example, a simple calculator. I compiled and runned the program successfully. When run the program it says, type some statements and then type . to compile and run these statements. I type the followings in distinct runs and after each line i type a . (period).
2
2;
2*2
2*2;
x=2
x=2;
But none of them works. Everytime it says "parsing failed.". What am is missing, or is there sth wrong with the example. The example program's grammar is here. Note that i am aware that i am not using the latest spirit, i use 1.46.1 version which is the default in Ubuntu 12.04.
You appear to have missed the fact that the program parses statements, not bare expressions, see http://www.boost.org/doc/libs/1_46_1/libs/spirit/example/qi/calc6/calc6c.hpp
So try this:
var y;
var x = 6;
y = 3 * x;
Outputs:
-------------------------
Parsing succeeded
-------------------------
Results------------------
x: 6
y: 18
-------------------------
Bye... :-)
Hope that helps. And consider upgrading boost - installing it from source is really simple on Debian/Ubuntus.

Tracking code versions in an executable

I have a reasonable sized ( around 40k lines) machine learning system written in C++. This is still in active development and I need to run experiments regularly even as I make changes to my code.
The output of my experiments is captured in simple text files. What I would like to do when looking at these results is have some way of figuring out the exact version of the code that produced it. I usually have around 5 to 6 experiments running simultaneously, each on a slightly different version of the code.
I would like to know for instance that a set of results was obtained by compiling version 1 of file A, version 2 of file B etc (I just need some identifier and the output of "git describe" will do fine here ).
My idea is to somehow include this info when compiling the binary. This way, this can be printed out along with the results.
Any suggestions how this can be done in a nice way. In particular, any nice way of doing this with git?
I generate a single source file as part of my build process that looks like this:
static const char version_cstr[] = "93f794f674 (" __DATE__ ")";
const char * version()
{
return version_cstr;
}
Then its easy to log the version out on startup.
I originally used a DEFINE on the command line, but that meant every version change everything got recompiled by the build system - not nice for a big project.
Here's the fragment of scons I use for generating it, maybe you can adapt it to your needs.
# Lets get the version from git
# first get the base version
git_sha = subprocess.Popen(["git","rev-parse","--short=10","HEAD"], stdout=subprocess.PIPE ).communicate()[0].strip()
p1 = subprocess.Popen(["git", "status"], stdout=subprocess.PIPE )
p2 = subprocess.Popen(["grep", "Changed but not updated\\|Changes to be committed"], stdin=p1.stdout,stdout=subprocess.PIPE)
result = p2.communicate()[0].strip()
if result!="":
git_sha += "[MOD]"
print "Building version %s"%git_sha
def version_action(target,source,env):
"""
Generate file with current version info
"""
fd=open(target[0].path,'w')
fd.write( "static const char version_cstr[] = \"%s (\" __DATE__ \")\";\nconst char * version()\n{\n return version_cstr;\n}\n" % git_sha )
fd.close()
return 0
build_version = env.Command( 'src/autogen/version.cpp', [], Action(version_action) )
env.AlwaysBuild(build_version)
You can use $Id:$ in your source file, and Git will substitute that with the sha1 hash, if you add the file containing this phrase in .gitattributes with the option "ident" (see gitattributes).