Communication between R and C++ - c++

I have a program written in C++ which calculates values for a likelihood function, which relies on lot of data. I want to be able to call the function from R to request function values (the calculations would take to much time in R, and the C++ program is already to long to change it, it's approximately 150K lines of code).
I can do this to request one value, but then the C++ application terminates and I have to restart it and load all the data again, (did this with .c()). The loading takes from 10-30 seconds, depending on the model for the likelihood function and the data, and I was thinking if there is a way to keep the C++ application alive, waiting for requests for function values, so I don't have to read all the data back into memory. Already calculating one function value in the C++ application takes around half a second, which is very long for C++.
I was thinking about using pipe() to do this, and ask you if that is a feasible option or should I use some other method? Is it possible to do this with rcpp?
I'm doing this to test minimizing algorithms for R on this function.

Forget about .C. That is clunky. Perhaps using .C over .Call or .External made sense before Rcpp. But now with the work we've put in Rcpp, I really don't see the point of using .C anymore. Just use .Call.
Better still, with attributes (sourceCpp and compileAttributes), you don't even have to see the .Call anymore, it just feels like you are using a c++ function.
Now, if I wanted to do something that preserves states, I'd use a module. For example, your application is this Test class. It has methods do_something and do_something_else and it counts the number of times these methods are used:
#include <Rcpp.h>
using namespace Rcpp ;
class Test {
public:
Test(): count(0){}
void do_something(){
// do whatever
count++ ;
}
void do_something_else(){
// do whatever
count++ ;
}
int get_count(){
return count ;
}
private:
int count ;
} ;
This is pretty standard C++ so far. Now, to make this available to R, you create a module like this :
RCPP_MODULE(test){
class_<Test>( "Test" )
.constructor()
.method( "do_something", &Test::do_something )
.method( "do_something_else", &Test::do_something_else )
.property( "count", &Test::get_count )
;
}
And then you can just use it :
app <- new( Test )
app$count
app$do_something()
app$do_something()
app$do_something_else()
app$count

There are several questions here.
What is the best way to call C++ code from R?
As other commenters have pointed out, the Rcpp package provides the nicest interface. Using the .Call function from base R is also possible, but not recommended as nice as Rcpp.
How do I stop repeatedly passing data back and forth between R and C++?
You'll just just to restructure your code a little bit. Rewrite a wrapper routine in C++ that calls all the existing C++ routines, and call that from R.

Related

How does it work and compile a C++ extension of TCL with a Macro and no main function

I have a working set of TCL script plus C++ extension but I dont know exactly how it works and how was it compiled. I am using gcc and linux Arch.
It works as follows: when we execute the test.tcl script it will pass some values to an object of a class defined into the C++ extension. Using these values the extension using a macro give some result and print some graphics.
In the test.tcl scrip I have:
#!object
use_namespace myClass
proc simulate {} {
uplevel #0 {
set running 1
for {} {$running} { } {
moveBugs
draw .world.canvas
.statusbar configure -text "t:[tstep]"
}
}
}
set toroidal 1
set nx 100
set ny 100
set mv_dist 4
setup $nx $ny $mv_dist $toroidal
addBugs 100
# size of a grid cell in pixels
set scale 5
myClass.scale 5
The object.cc looks like:
#include //some includes here
MyClass myClass;
make_model(myClass); // --> this is a macro!
The Macro "make_model(myClass)" expands as follows:
namespace myClass_ns { DEFINE_MYLIB_LIBRARY; int TCL_obj_myClass
(mylib::TCL_obj_init(myClass),TCL_obj(mylib::null_TCL_obj,
(std::string)"myClass",myClass),1); };
The Class definition is:
class MyClass:
{
public:
int tstep; //timestep - updated each time moveBugs is called
int scale; //no. pixels used to represent bugs
void setup(TCL_args args) {
int nx=args, ny=args, moveDistance=args;
bool toroidal=args;
Space::setup(nx,ny,moveDistance,toroidal);
}
The whole thing creates a cell-grid with some dots (bugs) moving from one cell to another.
My questions are:
How do the class methods and variables get the script values?
How is possible to have c++ code and compile it without a main function?
What is that macro doing there in the extension and how it works??
Thanks
Whenever a command in Tcl is run, it calls a function that implements that command. That function is written in a language like C or C++, and it is passed in the arguments (either as strings or Tcl_Obj* values). A full extension will also include a function to do the library initialisation; the function (which is external, has C linkage, and which has a name like Foo_Init if your library is foo.dll) does basic setting up tasks like registering the implementation functions as commands, and it's explicit because it takes a reference to the interpreter context that is being initialised.
The implementation functions can do pretty much anything they want, but to return a result they use one of the functions Tcl_SetResult, Tcl_SetObjResult, etc. and they have to return an int containing the relevant exception code. The usual useful ones are TCL_OK (for no exception) and TCL_ERROR (for stuff's gone wrong). This is a C API, so C++ exceptions aren't allowed.
It's possible to use C++ instance methods as command implementations, provided there's a binding function in between. In particular, the function has to get the instance pointer by casting a ClientData value (an alias for void* in reality, remember this is mostly a C API) and then invoking the method on that. It's a small amount of code.
Compiling things is just building a DLL that links against the right library (or libraries, as required). While extensions are usually recommended to link against the stub library, it's not necessary when you're just developing and testing on one machine. But if you're linking against the Tcl DLL, you'd better make sure that the code gets loaded into a tclsh that uses that DLL. Stub libraries get rid of that tight binding, providing pretty strong ABI stability, but are little more work to set up; you need to define the right C macro to turn them on and you need to do an extra API call in your initialisation function.
I assume you already know how to compile and link C++ code. I won't tell you how to do it, but there's bound to be other questions here on Stack Overflow if you need assistance.
Using the code? For an extension, it's basically just:
# Dynamically load the DLL and call the init function
load /path/to/your.dll
# Commands are all present, so use them
NewCommand 3
There are some extra steps later on to turn a DLL into a proper Tcl package, abstracting code that uses the DLL away from the fact that it is exactly that DLL and so on, but they're not something to worry about until you've got things working a lot more.

IOS code has become very slow because of objc_msgSend

I have rewritten part of my code from very simple c arrays to using (or trying to use) objects in order to get more structure into it. Instead of passing arrays through the function header I am now using a global array defined by a singleton. You can see an example of a function in my code below:
it was:
void calcdiv(int nx,int ny,float **u,float **v,
float **divu,float dx,float dy,float **p,
float dt,float rho, float **bp,float **lapp)
{
int i,j;
for (i=2;i<=nx-3;++i){
for (j=2;j<=ny-3;++j){
divu[i][j] = (u[i+1][j]-u[i-1][j])*facu +
(v[i][j+1]-v[i][j-1])*facv;
}
}
...
now it is:
void calcdiv()
{
int i,j;
SingletonClass* gV = [SingletonClass sharedInstance];
for (i=2;i<=gV.nx-3;++i){
for (j=2;j<=gV.ny-3;++j){
gV.divu[i][j] = (gV.u[i+1][j]-gV.u[i-1][j])*facu +
(gV.v[i][j+1]-gV.v[i][j-1])*facv;
}
}
...
Before the restructuring I have been using the function call as given above. That means passing the pointers to the arrays directly. Now I access the arrays by the singleton call "SingletonClass* gV...". It works very fine except the fact that it is much slower than before. The profiler tells me that my program spends 41% of the time with objc_msgSend which I have not had before.
From reading through the posts I have understood that this probably can happen when msgSend is called very often. This is then most likely the case here, because my program needs a lot of number crunching in order to display an animated flow with OpenGl.
This leads me to my question: What would you suggest? Should I stay with my simple C implementation or is there a rather simple way to accelerate the objective c version? Please be patient with me since I am new to objective c programming.
Any hints and recommendations are greatly appreciated! Thanks in advance.
If your straight C method works fine, and your Objective C method puts you at a disadvantage due to method calling, and you need the performance, then there's no reason not to use straight C. From looking at your code, I don't see any advantage to whatever "structure" you're adding, because the working code looks almost precisely the same. In other words, Obj-C doesn't buy you anything here, but straight C does, so go with what's best for your user, because in terms of maintainability and readability, there's no difference in the two implementations.

Passing function pointers as an API interface to a compiled library

Dearest stack exchange,
I'm programming an MRI scanner. I won't go into too much background, but I'm fairly constrained in how much code I've got access to, and the way things have been set up is...suboptimal. I have a situation as follows:
There is a big library, written in C++. It ultimately does "transcoding" (in the worst possible way), writing out FPGA assembly that DoesThings. It provides a set of functions to "userland" that are translated into (through a mix of preprocessor macros and black magic) long strings of 16 bit and 32 bit words. The way this is done is prone to buffer overflows, and generally to falling over.*
The FPGA assembly is then strung out over a glorified serial link to the relevant electronics, which executes it (doing the scan), and returning the data back again for processing.
Programmers are expected to use the functions provided by the library to do their thing, in C (not C++) functions that are linked against the standard library. Unfortunately, in my case, I need to extend the library.
There's a fairly complicated chain of preprocessor substitution and tokenization, calling, and (in general) stuff happening between you writing doSomething() in your code, and the relevant library function actually executing it. I think I've got it figured out to some extent, but it basically means that I've got no real idea about the scope of anything...
In short, my problem is:
In the middle of a method, in a deep dark corner of many thousands of lines of code in a big blob I have little control over, with god-knows-what variable scoping going on, I need to:
Extend this method to take a function pointer (to a userland function) as an argument, but
Let this userland function, written after the library has been compiled, have access to variables that are local to both the scope of the method where it appears, as well as variables in the (C) function where it is called.
This seems like an absolute mire of memory management, and I thought I'd ask here for the "best practice" in these situations, as it's likely that there are lots of subtle issues I might run into -- and that others might have lots of relevant wisdom to impart. Debugging the system is a nightmare, and I've not really got any support from the scanner's manufacturer on this.
A brief sketch of how I plan to proceed is as follows:
In the .cpp library:
/* In something::something() /*
/* declare a pointer to a function */
void (*fp)(int*, int, int, ...);
/* by default, the pointer points to a placeholder at compile time*/
fp = &doNothing(...);
...
/* At the appropriate time, point the pointer to the userland function, whose address is supplied as an argument to something(): /*
fp= userFuncPtr;
/* Declare memory for the user function to plonk data into */
i_arr_coefficients = (int) malloc(SOMETHING_SENSIBLE);
/* Create a pointer to that array for the userland function */
i_ptr_array=&i_arr_coefficients[0];
/* define a struct of pointers to local variables for the userland function to use*/
ptrStrct=createPtrStruct();
/* Call the user's function: */
fp(i_ptr_array,ptrStrct, ...);
CarryOnWithSomethingElse();
The point of the placeholder function is to keep things ticking over if the user function isn't linked in. I get that this could be replaced with a #DEFINE, but the compiler's cleverness or stupidity might result in odd (to my ignorant mind, at least) behaviour.
In the userland function, we'd have something like:
void doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...) {
double a=*ptrStrct.a;
double b=*ptrStrct.b;
double c=*localVariableAddresses.c;
double d=doMaths(a, b, c);
/* I.e. do maths using all of these numbers we've got from the different sources */
storeData(i_ptr_array, d);
/* And put the results of that maths where the C++ method can see it */
}
...
something(&doUsefulThings(i_ptr_array, ptrStrct, localVariableAddresses, ...), ...);
...
If this is as clear as mud please tell me! Thank you very much for your help. And, by the way, I sincerely wish someone would make an open hardware/source MRI system.
*As an aside, this is the primary justification the manufacturer uses to discourage us from modifying the big library in the first place!
You have full access to the C code. You have limited access to the C++ library code. The C code is defining the "doUsefullthings" function. From C code you are calling the "Something" function ( C++ class/function) with function pointer to "doUseFullThings" as the argument. Now the control goes to the C++ library. Here the various arguments are allocated memory and initialized. Then the the "doUseFullThings" is called with those arguments. Here the control transfers back to the C code. In short, the main program(C) calls the library(C++) and the library calls the C function.
One of the requirements is that the "userland function should have access to local variable from the C code where it is called". When you call "something" you are only giving the address of "doUseFullThings". There is no parameter/argument of "something" that captures the address of the local variables. So "doUseFullThings" does not have access to those variables.
malloc statement returns pointer. This has not been handled properly.( probably you were trying to give us overview ). You must be taking care to free this somewhere.
Since this is a mixture of C and C++ code, it is difficult to use RAII (taking care of allocated memory), Perfect forwarding ( avoid copying variables), Lambda functions ( to access local varibales) etc. Under the circumstances, your approach seems to be the way to go.

generating a function call depth using python for source files written in C

I have to find out whether a list of variables are modified inside a function written in C using python.The source files to browse are written in C. And there are around 2000 files and around 1000 variables in my project. The main reason of this script is basically to check the data consistency between interrupt handling of different coprocessors.
e.g.
Variable List = [var_w,var_x,var_y,var_z]
/*Module 1.c*/
ISR ()
{
var_x++;
fun_y();
fun_z();
}
/* end of the module 1*/
/* modul2.c */
fun_y() {var_y = 1;}
/* module3.c */
fun_z() { fun_zz();}
fun_zz() {var_z ++;}
/***************/
ISR
->fun_y
->fun_z
->fun_zz
->....
->
.....
..........
So the result of the script shall be like var_x ,var_y,var_z are modified by the ISR.
Could you please suggest me a better way of doing it ?
Will it help to use python Yacc ?
Thanking you.
With best regards
You're out of luck.
Theoretically speaking, in its general case, deciding whether a program/function changes a variable (i.e. without running the program) is an undecidable problem. If it were decidable, one could easily solve the halting problem using the program deciding whether a program changes a variable (by reduction).
You could come up with a partial solution, to find some of the cases where variables get changed. But it sounds like it isn't worth the effort.

directly calling from what user inputs and Is there a concept of generating a function at run time?

Is there a way out to call a function directly from the what the user inputs ?
For example : If the user inputs greet the function named greet is called.
I don't want any cases or comparison for the call to generate.
#include <iostream>
#include<string>
using namespace std;
void nameOfTheFunction(); // prototype
int main() {
string nameOfTheFunction;
getline(cin,nameOfTheFunction); // enter the name of Function
string newString = nameOfTheFunction + "()"; // !!!
cout << newString;
// now call the function nameOfTheFunction
}
void nameOfTheFunction() {
cout << "hello";
}
And is there a concept of generating the function at run time ?
You mean run time function generation ??
NO.
But you can use a map if you already know which all strings a user might give as input (i.e you are limiting the inputs).
For the above you can probably use std::map &lt std::string, boost::function &lt... &gt &gt
Check boost::function HERE
In short, no this isn't possible. Names in C++ get turned into memory offsets (addresses), and then the names are discarded**. At runtime C++ has no knowledge of the function or method names it's actually running.
** If debug symbols are compiled in, then the symbols are there, but impractical to get access to.
Generating a function at runtime has a lot of drawbacks (if it is possible at all) and there is generally no good reason to do it in a language like C++. You should leave that to scripting languages (like Perl or Python), many offer a eval() function that can interpret a string like script code and execute it.
If you really, really need to do have something like eval() in a compiled language such as C++, you have a few options:
Define your own scripting language and write a parser/interpreter for it (lots of work)
Define a very simple imperative or math language that can be easily parsed and evaluated using well-known design patterns (like Interpreter)
Use an existing scripting language that can be easily integrated into your code through a library (example: Lua)
Stuff the strings of code you want to execute at runtime through an external interpreter or compiler and execute them through the operating system or load them into your program using dlopen/LoadLibrary/etc.
(3.) is probably the easiest and best approach. If you want to keep external dependencies to a minimum or if you need direct access to functionality and state inside your main program, I suggest you should go for (2.) Note that you can have callbacks into your own code in that case, so calling native functions from the script is not a problem. See here for a tutorial
If you can opt for a language like Java or C#, there's also the option to use the compiler built into the runtime itself. Have a look here for how to do this in Java