Using RcppArmadillo:sample with RcppParallel - c++

I would like to build a parallel version of R sample() function using Rcpp parallel but i lack some C++ experience and time.
I thought about starting from the Matrix transform example#rcpp parallel site
and using RcppArmadillo:sample function instead of std:transform.
Two questions:
1/Is it possible? (i.e thread safe)
2/I don't fully grasp the operator part in the example and how to change to use another function (the begin and end usage is confusing to me).
Thank you

Related

Parallel loop operating on/with class members

I'm trying to use openMP to parallelize some sections of a relatively complex simulation model of a car I have been programming in C++.
The whole model is comprised of several nested classes. Each instance of the class "Vehicle" has four instances of a class "Suspension", and each of them has one instance of the class Tyre. There's quite a bit more to it but it shouldn't be relevant to the problem.
I'm trying to parallelize the update of the Suspension on every integration step with a code that looks like follows. This code is part of another class containing other simuation data, including one or several cars.
for (int iCar = 0; iCar < this->numberOfCars; iCar++) {
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 1)
for (int iSuspension = 0; iSuspension < 4; iSuspension++) {
this->cars[iCar].suspensions[iSuspension].update();
}
}
I've actually simplified it a bit and changed the variable names hoping to make it a bit more understandable (and not being masking the problem by doning so!)
The method "update" just computes some data of the corresponding suspension on each time step and saves it in several proporties of its own instance of the Suspension class. All instances of the class Suspension are independent of each other, so that every call to the method "update" accesses only to data contained in the same instance of "Suspension".
The behaviour that I'm getting using the debugger can be described as follows:
The first time the loop is run (at the first time step of the simulation) it runs ok. Always. All four suspensions are updated correctly.
The second time the loop is run, or at the latest on the third, at least one of the suspensions become updated with correpted data. It's quite common that two of the suspension become exactly the same (corrupted) data, which shouldn't be possible, as they are configured from the start with slightly different parameters.
If I run it with one loop instead of four (omp_set_num_threads(1)) it works flawlessly. Needless to say, the same applies when I run it without any openMP preprocessor directives.
I'm aware it may not be possible to figure out a solution to the problem without knowing how the rest of the program works, but I hope somebody can at least tell if there's any reason why you just can't access to properties and methods of a class within a parallel openMP loop the way I'm trying to do it.
I'm using W10 and Visual Studio 2017 Community. I tried to compile the project with and without optimizations, with no difference.
Thanks a lot in advance!

Can I read a SMT2 file into a solver through the z3 c++ interface?

I've got a problem where the z3 code embedded in a larger system isn't finding a solution to a certain set of constraints (added through the C++ interface) despite some fairly long timeouts. When I dump the constraints to a file (using the to_smt2() method on the solver, just before the call to check()), and run the file through the standalone z3 executable, it solves the system in about 4 seconds (returning sat). For what it's worth, the file is 476,587 lines long, so a fairly big set of constraints.
Is there a way I can read that file back into the embedded solver using the C++ interface, replacing the existing constraints, to see if the embedded version can solve starting from the exact same starting point as the standalone solver? (Essentially, how could I create a corresponding from_smt2(stream) method on the solver class?)
They should be the same set of constraints as now, of course, but maybe there's some ordering effect going on when they are read from the file, or maybe there are some subtle differences in the solver introduced when we embedded it, or something that didn't get written out with to_smt2(). So I'd like to try reading the file back, if I can, to narrow down the possible sources of the difference. Suggestions on what to look for while debugging the long-running version would also be helpful.
Further note: it looks like another user is having similar issues here. Unlike that user, my problem uses all bit-vectors, and the only unknown result is the one from the embedded code. Is there a way to invoke the (get-info :reason-unknown) from the C++ interface, as suggested there, to find out why the embedded version is having a problem?
You can use the method "solver::reason_unknown()" to retrieve explanations for search failure.
There are methods for parsing files and strings into a single expression.
In case of a set of assertions, the expression is a conjunction.
It is perhaps a good idea to add such a method directly to the solver class for convenience. It would be:
void from_smt2_string(char const* smt2benchmark) {
expr fml = ctx().parse_string(smt2benchmark);
add(fml);
}
So if you were to write it outside of the solver class you need to:
expr fml = solver.ctx().parse_string(smt2benchmark);
solver.add(fml);

Different ways to make kernel

In this tutorial
There are 2 methods to run the kernel, and another one mentioned in the comments:
1.
cl::KernelFunctor simple_add(cl::Kernel(program,"simple_add"),queue,cl::NullRange,cl::NDRange(10),cl::NullRange);
simple_add(buffer_A,buffer_B,buffer_C);
However, I found out, that KernelFunctor has gone.
So I tried the alternative way:
2.
cl::Kernel kernel_add=cl::Kernel(program,"simple_add");
kernel_add.setArg(0,buffer_A);
kernel_add.setArg(1,buffer_B);
kernel_add.setArg(2,buffer_C);
queue.enqueueNDRangeKernel(kernel_add,cl::NullRange,cl::NDRange(10),cl::NullRange);
queue.finish();
It compiles and runs succussfully.
However, there is a 3rd option in the comments:
3.
cl::make_kernel simple_add(cl::Kernel(program,"simple_add"));
cl::EnqueueArgs eargs(queue,cl::NullRange,cl::NDRange(10),cl::NullRange);
simple_add(eargs, buffer_A,buffer_B,buffer_C).wait();
Which does not compile, I think the make_kernel needs template arguments.
I'm new to OpenCl, and didn't manage to fix the code.
My question is:
1. How should I modify the 3. code to compile?
2. Which way is better and why? 2. vs. 3.?
You can check the OpenCL C++ Bindings Specification for a detailed description of the cl::make_kernel API (in section 3.6.1), which includes an example of usage.
In your case, you could write something like this to create the kernel functor:
auto simple_add = cl::make_kernel<cl::Buffer&, cl::Buffer&, cl::Buffer&>(program, "simple_add");
Your second question is primarily opinion based, and so is difficult to answer. One could argue that the kernel functor approach is simpler, as it allows you to 'call' the kernel almost as if it were just a function and pass the arguments in a familiar manner. The alternative approach (option 2 in your question) is more explicit about setting arguments and enqueuing the kernel, but more closely represents how you would write the same code using the OpenCL C API. Which method you use is entirely down to personal preference.

how to share a lib between process and called script subprocess using SWIG?

I have a C++ program foobar which starts with main() and then the flow of control goes through a first part, then the second part of the program. If I change main to foobar_main, I can then compile the whole program and a SWIG Python wrapper to a shared library foobar.so, and import this to Python, call foobar_main from within Python and everything works fine.
The second part communicates with the first one by some respectable C++ constructs. Specifically: the first part creates some single objects of some classes, and the second part uses class static methods to get those objects.
Now I want to run only the first part from main() and the second part from Python. That is, I want to start the C++ program foobar and then after the first part is finished, run a Python script (programmatically from within C++) that continues with the second part.
To do this, I:
compile the second part and a SWIG wrapper to foobar2.so
replace the second part of C++ code with system("python foobar2.py")
compile the modified C++ program to foobar1.so and load to foobar
write the script foobar2.py which imports foobar1 and foobar2 and then equivalent to the second part
Then I attempt to run foobar. It does not work, because it appears, that the routines in the second part complain that certain steps which should have been done in the first part, are not done.
This is embarasing but obviously I have some deep flaws here in my understanding of how computers work :) Can somebody clue me in what I am missing, including possibly simplifying the above process?
I'm going to assume your C++ code looks like this:
void part1()
{}
void part2()
{}
int main()
{
part1();
part2();
}
And that you have a Python version of part2() that is implemented with some other wrapped C++ functions. If these assumptions are wrong let me know.
I think the easiest way to go is to wrap part1() along with the other wrapped part2-related functions, then have a Python script like this:
import foobar
foobar.part1()
py_part2()
This of course means that the program starts in Python. If you need to start a C++ program for some reason (i.e. you need main()) then in order to use py_part2() you'll have to embed the Python interpreter inside your C++ program. This is a much more difficult and involved process, this answer has good info about how to get started.
Since you're learning I'll explain why system("python foobar2.py") doesn't work. In this scheme you have your C++ program start another process (program), named python, and then wait for it to finish. These are two completely different programs that in your case don't talk to each other and don't share anything in common. Hence why it doesn't work.
In general, reconsider anything that involves system. Its primary use seems to be to point out beginner programmers.

Communication between R and C++

I have a program written in C++ which calculates values for a likelihood function, which relies on lot of data. I want to be able to call the function from R to request function values (the calculations would take to much time in R, and the C++ program is already to long to change it, it's approximately 150K lines of code).
I can do this to request one value, but then the C++ application terminates and I have to restart it and load all the data again, (did this with .c()). The loading takes from 10-30 seconds, depending on the model for the likelihood function and the data, and I was thinking if there is a way to keep the C++ application alive, waiting for requests for function values, so I don't have to read all the data back into memory. Already calculating one function value in the C++ application takes around half a second, which is very long for C++.
I was thinking about using pipe() to do this, and ask you if that is a feasible option or should I use some other method? Is it possible to do this with rcpp?
I'm doing this to test minimizing algorithms for R on this function.
Forget about .C. That is clunky. Perhaps using .C over .Call or .External made sense before Rcpp. But now with the work we've put in Rcpp, I really don't see the point of using .C anymore. Just use .Call.
Better still, with attributes (sourceCpp and compileAttributes), you don't even have to see the .Call anymore, it just feels like you are using a c++ function.
Now, if I wanted to do something that preserves states, I'd use a module. For example, your application is this Test class. It has methods do_something and do_something_else and it counts the number of times these methods are used:
#include <Rcpp.h>
using namespace Rcpp ;
class Test {
public:
Test(): count(0){}
void do_something(){
// do whatever
count++ ;
}
void do_something_else(){
// do whatever
count++ ;
}
int get_count(){
return count ;
}
private:
int count ;
} ;
This is pretty standard C++ so far. Now, to make this available to R, you create a module like this :
RCPP_MODULE(test){
class_<Test>( "Test" )
.constructor()
.method( "do_something", &Test::do_something )
.method( "do_something_else", &Test::do_something_else )
.property( "count", &Test::get_count )
;
}
And then you can just use it :
app <- new( Test )
app$count
app$do_something()
app$do_something()
app$do_something_else()
app$count
There are several questions here.
What is the best way to call C++ code from R?
As other commenters have pointed out, the Rcpp package provides the nicest interface. Using the .Call function from base R is also possible, but not recommended as nice as Rcpp.
How do I stop repeatedly passing data back and forth between R and C++?
You'll just just to restructure your code a little bit. Rewrite a wrapper routine in C++ that calls all the existing C++ routines, and call that from R.