Does an R compiler to C/C++ exist? - c++

I'm wondering about the best way to deploy R. Matlab has the "matlab compiler" (MCR). There has been discussion about something similar in the past for R that would compile R into C or C++. Does anyone have any experience with the R to C Compiler (RCC) that was developed by John Garvin at Rice?
I've looked into it, and it seems to be the only project that worked on compiling R code into executable code. And as far as I can tell, it isn't still being used.
[Edit 1:]: To be clear, I know that there are C and C++ (and Java, Python, etc.) interfaces to R (rJava, rcpp, Rpy, etc.). I'm wondering about specific ways to compile and deploy R code without installing R in advance.
[Edit 2:]: John Mellor-Crummey tells me that they're still working on RCC and hope to make it available in 4 months or so (at the earliest). I'll update this further if I find anything else out.

A byte code compiler will be part of the R 2.13 release. By default it is not used in this release but it is available; I expect the 2.14 release will by default byte compile all base and recommended packages. The compiler::compile help page and the R Installation and Administration Manual give some more details.

I had forgotten about the Rice project, it has been a while. I think the operational term here is stated at the top of the project page: Last Updated 3/8/06.
And we all know R changes a lot. So I have only the standard few pointers for you:
Luke Tierney, who not only knows a lot about R internals but equally about byte compilers, has been working on such a project. Nothing ready yet, and it would still work in conjunction with the standard R engine.
Stephen Milborrow has the Ra extension to R that works with his just-in-time compiler package jit
my Introduction to High-Performance Computing with R tutorials (most recent tutorial slides from UseR! 2009) covers the profiling, compiling extentions, parallel computing with R, ... part, including
Rcpp and and a bit about
RInside.
In short: there is no way have what you desire specific ways to compile and deploy R code without installing R in advance. Sorry.
Edit/Update (April 2011): Luke's new compiler package will be part of R 2.13.0 (to be released April 2011) but not 'activated' by default which is expected for R 2.14.0 expected for October 2011.
Edit/Update (December 2011): Prof Tierney just release a massive 100+ page paper on the byte-code compiler.

Why do people get the fear when deploying R? I'm fairly sure I've seen this question before.
Installing R is a piece of cake (you don't actually say which OS you care about). For Windows its one .exe. file, run it, say "yes" a few times and its done. I suspect the installer exe probably has flags for unattended installation too.

You may check out the P compiler which implements a subset of R. Especially, lists, matrices, vectors etc. are implemented as well as lsfit, chol, svd, ...
You can download a free version at
www.ptechnologies.org
It speeds up computations substantially.
Best,
AS

I haven't used Garvin's package and don't know what is possible along those lines. However:
Typically people just write computationally intensive functions directly in C/C++/Fortran, after profiling to find the bottlenecks. See the RCpp interface or Calling C functions from R using .C and .Call for examples. The Scythe Statistical Library is also very nice for R users since the syntax/function names are similar.

Related

How to tell if Suitesparse/CHOLMOD is using GPU?

I built Julia, which incorporates SuiteSparse, from scratch. When building the SuiteSparse dependency I ensured the instructions were followed for setting the relevant parts of the SuiteSparse_config.mk file.
However, having completed the build the execution time for c = A\b with 220k unknowns (very regular structure for A) isn't changed.
How can I test whether CHOLMOD is actively using the GPU or not?
I did notice that something similar was asked here. It was for a C/CUDA environment, but perhaps it applies.
From that answer:
Only the long integer version of CHOLMOD can leverage GPU acceleration.
The long integer version is distinguished by api calls like cholmod_l_start instead of cholmod_start.
It may be the case that Julia does not use the "long integer" version of CHOLMOD calls. I see no evidence for it in cholmod.jl.
As I said earlier, perhaps one of the Julia Language developers will pipe up if you file the issue in the repo. Otherwise, you may need to build Julia after changing cholmod.jl first.

Simple C++ source instrumentation?

I want to use Shiny on a large-ish C++ code base, but I'd rather not add the required PROFILE_FUNC() calls to my source. I figure it's easy enough to write a script that for each source file, regex-searches for function definitions, adds a macro call just after the opening bracket and pipes the result to g++; but that seems an awfully obvious source-code instrumentation case, so much so I find it hard to believe no-one has come up with a better solution already.
Unfortunately, searching around I could only find references to LLVM / clang instrumentation and the odd research tool, which look like overly complicated solutions to my comparatively simple problem. In fact, there seems to be no straightforward way to perform simple automated code edits to C/C++ code just prior to compilation.
Is that so? Or am I missing something?
Update: I forgot to mention this "C++ code base" is a native application I am porting to Android. So I can use neither gprof (which isn't available on Android), Valgrind (which requires an older version of the NDK than what i'm using) nor the android-ndk-profiler (which is for dynamic libraries loaded by Android Activities, either Java or native, not plain executables). Hence my looking into Shiny.
Update 2: Despite previous claims I actually managed to build Valgrind on Android NDK r8e, so I settled on using it instead of Shiny. However I still think the original question is valid: isn't there any straightforward tool for effecting simple compile-time edits to C / C++ source files – some sort of macro preprocessor on steroids?
You can consider gprof or valgrind. If memory serves, gprof uses instrumentation and valgrind is a sampling-based profiler. Neither of them requires you to annotate source code.
You can use the android ndk profiler to profile C/C++ code
More info here
http://code.google.com/p/android-ndk-profiler/
You use gprof to analyse the results

C++ IDE with repl?

I'm looking for a good C++ IDE with a REPL. The one in visual studio isn't... well lets say most of the time if I copy/paste a line in source the REPL rejects it even if its the line I put a breakpoint or step over.
Are there any good IDEs or REPLs for C++?
Cling
What is Cling?
Cling is an interactive C++ interpreter, built on the top of LLVM and Clang libraries. Its advantages over the standard interpreters are that it has command line prompt and uses just-in-time (JIT) compiler for compilation. Many of the developers (e.g. Mono in their project called CSharpRepl) of such kind of software applications name them interactive compilers.
One of Cling's main goals is to provide contemporary, high-performance alternative of the current C++ interpreter in the ROOT project - CINT. The backward-compatibility with CINT is major priority during the development.
http://root.cern.ch/drupal/content/cling
CINT
What is CINT?
CINT is an interpreter for C and C++ code. It is useful e.g. for situations where rapid development is more important than execution time. Using an interpreter the compile and link cycle is dramatically reduced facilitating rapid development. CINT makes C/C++ programming enjoyable even for part-time programmers.
CINT is written in C++ itself, with slightly less than 400,000 lines of code. It is used in production by several companies in the banking, integrated devices, and even gaming environment, and of course by ROOT, making it the default interpreter for a large number of high energy physicists all over the world.
http://www.hanno.jp/gotom/Cint.html
CLing should be independant of Clang and abble to compile on any platform, recent works of CERN tend to separate Cling from Clang and it's good trends.
What I don't under is mostly the existence of Clipp in C++ allowing to parse javascript embedded in my C++ code and can't find a version of Clipp for just C++ / Boost / Eigen / Quantlib.
Another thing I don't understand is why TinyCC with a 200ko size is abble to parse windows.h without a problem and LLVM team complaining about Clang on windows.H detonate on tarmac.
All in all, with fusion, spirit, wave and the so many people wishing for a C++ REPL, why after 2 decade there is not even small version of it.
Here is my solutions
Forget about REPL C++ and stick to REPL C, use tinyCC and expose only the functionnal action of method by using pointer function A.function(toto t) -> function(A *, toto t). To make it works with object method you can also use declaration as struct __declspec(novtable) A { };
This will allow binary align compatibility between tinyCC struct understanding and your true object. True you will have to split the tuple of data and the tuple of method, but after all, that should have always be the case in first place. Object design should have split data and method into a dual model rather than a mixed model good for bug. In many case, the compiler will split the object into dual model anyway. This will provide extrem fast prototyping even for scientist and user of Cling/Cint.
Second solution, rather than REPL statement, use the dynamic load/unload pair, you setup a chain of compilation ( incremental build or not ) and auto relink compiled library when the source change. It isn't slow at all. This give the advantage to do it on any supported dynamic library OS and it's very EASY to do.
Third solution, the most easiest way, boot a linux based vm ( install llvm tool chain), and use Cling on the vm. That won't work in a firm full windowOSed but it seems LLVM are windows OS haters.

C++: Quickly determine appropriate list of header includes?

Is there any tool or method that can speed up this process?
For instance I just split neatTrick.cpp source file into two separate files neatTrickImplementation.cpp and neatTrickTests.cpp.
What I have to do now is to go through the list of #includes at the top of neatTrick.cpp and determine which of them need to go into the implementation file, and which need to go into the tests file. Some of the headers are required for both of them, some are not. Some may even be completely unnecessary.
I feel like my process (start with nothing, compile, see what's broken, add proper include, compile again, repeat) will produce the most unbloated code but it is so frustratingly slow. I think it'd be great if my IDE could analyze the rest of the headers in my project, see which ones could eliminate the current set of errors, and automate this task for me.
There was a talk by Chandler Carruth on Microsoft's "Going Native" (a C++ conference) where he said that the Clang tooling project had something in the pipeline to solve exactly this problem.
From my understanding, it was presented as something no publically available tool is able to do at the moment and most people were pretty impressed by this.
So: At the moment, there currently is no such tool. In the near future you will probably get something like this as a Clang-based tool to compile for yourself. Long-term, expect this to be a standard feature built upon a Clang toolchain.
(A bit OT: There currently is a discussion on the Clang/LLVM developers list dealing with a tooling/service infrastructure. The tools are not there yet but are under active development, currently by Google engineers, later probably by people in the whole industry and Clang open source community).
During the ACCU conference at Oxford last April, one of the speakers, Peter Sommerlad, demoed exactly this functionality with a plugin for Eclipse CDT, written by one of his students. I don't know if this plugin is already publicly available, but maybe you could drop him an e-mail to ask...

Edit and Continue on GDB

I know that E&C is a controversial subject and some say that it encourages a wrong approach to debugging, but still - I think we can agree that there are numerous cases when it is clearly useful - experimenting with different values of some constants, redesigning GUI parameters on-the-fly to find a good look... You name it.
My question is: Are we ever going to have E&C on GDB? I understand that it is a platform-specific feature and needs some serious cooperation with the compiler, the debugger and the OS (MSVC has this one easy as the compiler and debugger always come in one package), but... It still should be doable. I've even heard something about Apple having it implemented in their version of GCC [citation needed]. And I'd say it is indeed feasible.
Knowing all the hype about MSVC's E&C (my experience says it's the first thing MSVC users mention when asked "why not switch to Eclipse and gcc/gdb"), I'm seriously surprised that after quite some years GCC/GDB still doesn't have such feature. Are there any good reasons for that? Is someone working on it as we speak?
It is a surprisingly non-trivial amount of work, encompassing many design decisions and feature tradeoffs. Consider: you are debugging. The debugee is suspended. Its image in memory contains the object code of the source, and the binary layout of objects, the heap, the stacks. The debugger is inspecting its memory image. It has loaded debug information about the symbols, types, address mappings, pc (ip) to source correspondences. It displays the call stack, data values.
Now you want to allow a particular set of possible edits to the code and/or data, without stopping the debuggee and restarting. The simplest might be to change one line of code to another. Perhaps you recompile that file or just that function or just that line. Now you have to patch the debuggee image to execute that new line of code the next time you step over it or otherwise run through it. How does that work under the hood? What happens if the code is larger than the line of code it replaced? How does it interact with compiler optimizations? Perhaps you can only do this on a specially compiled for EnC debugging target. Perhaps you will constrain possible sites it is legal to EnC. Consider: what happens if you edit a line of code in a function suspended down in the call stack. When the code returns there does it run the original version of the function or the version with your line changed? If the original version, where does that source come from?
Can you add or remove locals? What does that do to the call stack of suspended frames? Of the current function?
Can you change function signatures? Add fields to / remove fields from objects? What about existing instances? What about pending destructors or finalizers? Etc.
There are many, many functionality details to attend to to make any kind of usuable EnC work. Then there are many cross-tools integration issues necessary to provide the infrastructure to power EnC. In particular, it helps to have some kind of repository of debug information that can make available the before- and after-edit debug information and object code to the debugger. For C++, the incrementally updatable debug information in PDBs helps. Incremental linking may help too.
Looking from the MS ecosystem over into the GCC ecosystem, it is easy to imagine the complexity and integration issues across GDB/GCC/binutils, the myriad of targets, some needed EnC specific target abstractions, and the "nice to have but inessential" nature of EnC, are why it has not appeared yet in GDB/GCC.
Happy hacking!
(p.s. It is instructive and inspiring to look at what the Smalltalk-80 interactive programming environment could do. In St80 there was no concept of "restart" -- the image and its object memory were always live, if you edited any aspect of a class you still had to keep running. In such environments object versioning was not a hypothetical.)
I'm not familiar with MSVC's E&C, but GDB has some of the things you've mentioned:
http://sourceware.org/gdb/current/onlinedocs/gdb/Altering.html#Altering
17. Altering Execution
Once you think you have found an error in your program, you might want to find out for certain whether correcting the apparent error would lead to correct results in the rest of the run. You can find the answer by experiment, using the gdb features for altering execution of the program.
For example, you can store new values into variables or memory locations, give your program a signal, restart it at a different address, or even return prematurely from a function.
Assignment: Assignment to variables
Jumping: Continuing at a different address
Signaling: Giving your program a signal
Returning: Returning from a function
Calling: Calling your program's functions
Patching: Patching your program
Compiling and Injecting Code: Compiling and injecting code in GDB
This is a pretty good reference to the old Apple implementation of "fix and continue". It also references other working implementations.
http://sources.redhat.com/ml/gdb/2003-06/msg00500.html
Here is a snippet:
Fix and continue is a feature implemented by many other debuggers,
which we added to our gdb for this release. Sun Workshop, SGI ProDev
WorkShop, Microsoft's Visual Studio, HP's wdb, and Sun's Hotspot Java
VM all provide this feature in one way or another. I based our
implementation on the HP wdb Fix and Continue feature, which they
added a few years back. Although my final implementation follows the
general outlines of the approach they took, there is almost no shared
code between them. Some of this is because of the architectual
differences (both the processor and the ABI), but even more of it is
due to implementation design differences.
Note that this capability may have been removed in a later version of their toolchain.
UPDATE: Dec-21-2012
There is a GDB Roadmap PDF presentation that includes a slide describing "Fix and Continue" among other bullet points. The presentation is dated July-9-2012 so maybe there is hope to have this added at some point. The presentation was part of the GNU Tools Cauldron 2012.
Also, I get it that adding E&C to GDB or anywhere in Linux land is a tough chore with all the different components.
But I don't see E&C as controversial. I remember using it in VB5 and VB6 and it was probably there before that. Also it's been in Office VBA since way back. And it's been in Visual Studio since VS2005. VS2003 was the only one that didn't have it and I remember devs howling about it. They intended to add it back anyway and they did with VS2005 and it's been there since. It works with C#, VB, and also C and C++. It's been in MS core tools for 20+ years, almost continuous (counting VB when it was standalone), and subtracting VS2003. But you could still say they had it in Office VBA during the VS2003 period ;)
And Jetbrains recently added it too their C# tool Rider. They bragged about it (rightly so imo) in their Rider blog.