So I put some code into c++ from R in order to make my model run faster. The c++ code returns a list of 2 items: one vector called "trace" and one matrix called "weights". Once the c++ code has run I would like to reassign "weights" and "trace" in R to the values that were computed from the c++ code. Unfortunately, when I tried to do this I got the error: "Error: cannot change value of locked binding for 'weights'". So I searched for an unbinding function and found unlockBinding. I stuck that in my R code, but I am still getting the same error as before! Am I putting the unlockBinding function in the wrong place? Am I using it correctly? The items "weights" and "trace" do exist in the global environment so why are they not unlocking?
I assigned the list that the c++ code returns to the variable "result", then I used the unlockBinding function, then I reassigned "weights" and "trace" to be what was computed in the c++ code. Here is the code:
batch <- function(n.training){
for(i in 1:n.training){
g <- input.correlation()
for(o in 1:nrow(g)){
result <- traceUpdate(g[o,], trace, weights, trace.param, learning.rate)
unlockBinding("weights", .GlobalEnv)
unlockBinding("trace", .GlobalEnv)
weights <<- result$weights
trace <<- result$trace
}
}
}
Here is the part of my C++ code that returns a list of 2 items, one being a matrix "weights" and the other being a vector "trace":
List traceUpdate(NumericVector input, NumericVector trace, NumericMatrix weights, double traceParam, double learningRate){
NumericVector output = forwardPass(input, trace.size(), weights);
for(int i = 0; i<trace.size(); i++){
trace[i] = (1 - traceParam) * trace[i] + traceParam * output[i];
for(int j=0; j<input.size(); j++){
double w = weights(j,i);
if(w >= 0){
weights(j,i) = w + learningRate * trace[i] * input[j] - learningRate * trace[i] * w;
}
}
}
//return weights
return List::create(Rcpp::Named("weights") = weights,
Rcpp::Named("trace") = trace);
}
If I simply reassign the weights like this:
weights <- matrix(0, nrow = 100, ncol = 20)
they do all change to zeroes and I do not get an error.
Also, when looking for solutions online I came across a way to unlock environments in R, but I'm pretty sure that's not what's wrong because the environment is not locked.
I'm new to this site and relatively new to programming so I apologize if my question is dumb or formatted incorrectly etc.
Thank you.
Just received a solution from my professor:
"Instead of treating weights and trace as global variables that are modified from inside the batch function, we can pass them into the function and return them out:"
batch <- function(weights, trace, n.training){
for(i in 1:n.training){
g <- input.correlation()
for(o in 1:nrow(g)){
result <- traceUpdate(g[o,], trace, weights, trace.param, learning.rate)
weights <- result$weights
trace <- result$trace
}
}
return(list(weights=weights, trace=trace))
}
result <- batch(weights, trace, 50)
weights <- result$weights
trace <- result$trace
Related
i am trying to solve a LP-model in CPLEX using C++ and Concert Technology.
I want to implement constraints (the subtour elimination constraints, to be more specific) that needs to query the value of two of my variables in the current solution:
The variable array xvar is indicating the edges, yvar is representing the nodes.
I implement these constraints by solving n (= number of nodes) Min-Cut-Problems on a modified graph, which is constructed by adding an artificial source and an artifical sink and connect these to every node of the original graph.
From what i've read so far, do i need a lazy constraint or a callback or none of this?
This is where i create the model and get it solved, access the values of the variables in the solution etc:
// Step 2: Construct the necessary CPLEX objects and the LP model
IloCplex solver(env);
std::cout<< "Original Graph g: " <<std::endl;
std::cout<< net.g() <<std::endl;
MCFModel model(env, net);
// Step 3: Load the model into cplex and solve
solver.extract(model);
solver.solve();
// Step 4: Extract the solution from the solver
if(solver.getStatus() != IloAlgorithm::Optimal) throw "Could not solve to optimality!";
IloNumArray xsol ( env, net.g().nEdges() );
IloNumArray ysol ( env, net.g().nNodes() );
IloNumArray rsol ( env, net.g().nGroups() );
IloNumArray wisol ( env, net.g().nGroups() );
IloNum ksol;
NumMatrix wsol ( env, net.g().nGroups());
for(IloInt i = 0; i < net.g().nGroups(); i++){
wsol[i] = IloNumArray( env, net.g().nGroups() );
}
solver.getValues(xsol, model.xvar());
solver.getValues(ysol, model.yvar());
solver.getValues(rsol, model.rvar());
solver.getValues(wisol, model.wivar());
ksol=solver.getValue(model.kvar());
for (IloInt i = 0; i < net.g().nGroups(); i++){
wsol[i] = wisol;
}
// Step 5: Print the solution.
The constraint, i need the current values of the variables xvar and yvar for, is created here:
//build subset constraint y(S) -x(E(S))>= y_i
void MCFModel::buildSubsetCons(){
IloExpr lhs(m_env);
IloCplex cplex(m_env);
IloNumArray xtemp ( m_env, m_net.g().nEdges() );
IloNumArray ytemp ( m_env, m_net.g().nNodes() );
std::vector<Edge> mg_twin;
std::vector<int> mg_weights;
int mg_s;
int mg_t;
SGraph mgraph;
std::vector<int> f;
int nOrigEdges = m_net.g().nEdges();
int nOrigNodes = m_net.g().nNodes();
cplex.getValues(xtemp, m_xvar);
cplex.getValues(ytemp, m_yvar);
mgraph = m_net.g().mod_graph();
mg_s = mgraph.nNodes()-1;
mg_t = mgraph.nNodes();
std::cout<<"modified graph:"<<std::endl;
std::cout<<mgraph<<std::endl;
// fill the weight of original edges with 1/2*x_e
foreach_edge(e, m_net.g()){
mg_weights.push_back((xtemp[e->idx()])/2);
}
// fill the weight of the edges from artificial source with zero
for(int i=0; i<m_net.g().nNodes(); i++){
mg_weights.push_back(0);
}
// fill the weight of the edges to artificial sink with f(i)
// first step: calculate f(i):
//f.resize(m_net.g().nNodes());
foreach_node(i, m_net.g()){
foreach_adj_edge(e, i, m_net.g()){
f[i] = f[i] + xtemp[e->idx()];
}
f[i] = (-1)*f[i]/2;
f[i] = f[i] + ytemp[i];
}
// second step: fill the weights vector with it
for(int i=0; i<m_net.g().nNodes(); i++){
mg_weights.push_back(f[i]);
}
// calculate the big M = abs(sum_(i in N) f(i))
int M;
foreach_node(i, m_net.g()){
M = M + abs(f[i]);
}
// Build the twin vector of the not artificial edges for mgraph
mg_twin.resize(2*nOrigEdges + 2*nOrigNodes);
for(int i=0; i < nOrigEdges ; ++i){
mg_twin[i] = mgraph.edges()[nOrigEdges + i];
mg_twin[nOrigEdges + i] = mgraph.edges()[i];
}
//Start the PreflowPush for every node in the original graph
foreach_node(v, m_net.g()){
// "contract" the edge between s and v
// this equals to higher the weights of the edge (s,v) to a big value M
// weight of the edge from v to s lies in mg_weights[edges of original graph + index of node v]
mg_weights[m_net.g().nEdges() + v] = M;
//Start PreflowPush for v
PreflowPush<int> pp(mgraph, mg_twin, mg_weights, mg_s, mg_t);
std::cout << "Flowvalue modified graph: " << pp.minCut() << std::endl;
}
}
The Object pp is to solve the Min-Cut-Problem on the modified graph mgraph with artificial source and sink. The original graph is in m_net.g().
When i compile and run it, i get the following error:
terminate called after throwing an instance of 'IloCplex::Exception'
Aborted
It seems to me, that it is not possible to access the values of xvar and yvar like this?
I do appreciate any help since i am quite lost how to do this.
Thank you very much!!
Two things...
I. I strongly suggest you to use a try-catch to better understand CPLEX Exceptions. You could perhaps understand the nature of the exception like this. As a matter of fact, I suggest you a try-catch-catch setting, sort of:
try {
//... YOUR CODE ...//
}
catch(IloException& e) {
cerr << "CPLEX found the following exception: " << e << endl;
e.end();
}
catch(...) {
cerr << "The following unknown exception was found: " << endl;
}
II. The only way to interact with CPLEX during the optimization process is via a Callback, and, in the case of Subtour Elimination Constraints (SECs) you will need to separate both integer and fractional SECs.
II.1 INTEGER: The first one is the easiest one, an O(n) routine would help you identify all the connected components of a node solution, then you could add the subsequent cuts to prevent this particular SEC from appearing in other nodes. You could either enforce your cuts locally, i.e. only on the current sub-tree, using the addLocal() function, or globally, i.e. on the entire Branch-and-Cut tree, using the add() function. In any case, ALWAYS remember to add .end() to terminate the cut container. Otherwise you WILL have serious memory leak issues, trust me with this, lol. This callback needs to be a done via a Lazy Constraint Callback (ILOLAZYCONSTRAINTCALLBACK)
II.2 FRACTIONAL: The second one is by far more complex. The easiest way to make it work is to use Professor Lysgaard's CVRPSEP library. It is nowadays most efficient way of computing capacity cuts, Multistar, generalized multistar, framed capacity, strengthened comb and hypotour cuts. Additionally, is rather easy to link with any existing code. The linkage also needs to be embedded on the solution process, hence, a callback is also required. In this case, it would be a User Cut Callback (ILOUSERCUTCALLBACK).
One is glad to be of service
Y
I want to use Rcpp to make certain parts of my code more efficient. I have a main R function in which my objects are defined, in this R functions I have several rcpp functions that use the r data objects as inputs. This is one example of an rcpp function that is called in the R-function:
void calculateClusters ( List order,
NumericVector maxorder,
List rank,
double lambda,
int nbrClass,
NumericVector nbrExamples) {
int current, i;
for ( current = 0; current < nbrClass; current ++ ) {
maxorder [current] = 0;
for ( i = 0; i < nbrExamples [ current ]; i++ ) {
order[ current ][i] = ( int ) ( rank[ current ][i] / lambda ) - 1;
}
if ( order[ current ][i] > maxorder[ current ] ) {
maxorder[ current ] = order[ current ][i];
}
}
}
This function calculates the maximum number of clusters for each class. In native c++ coding I would define my List as an int** and my NumericVector as int*. However in Rcpp this gives an error. I know the fault lies in the subsetting of these Lists (I handled them the same way as int**).
My question is how can I transform these int** succesfully into List, without loosing flexibility. For example the List order and distance have the structure order[[1]][1:500], order[[2]][1:500], so this would be exactly the same as int** in c++ where it would be order[1][1:500], order[2][1:500]. If there are 3 classes the order and the distance List change to order order[[1]][1:500], order[[2]][1:500], order[[3]][1:500]. How can I do this in Rcpp?
Briefly:
Example from the Rcpp Gallery
Example from the Rcpp Examples package on CRAN
QUESTION SUMMARY:
I have a [5 x 72580] matrix. I am trying to fit a Gaussian Mixture Model (GMM) to this data using the gmm_diag.learn() method with random_subset as the initial seeding mode. Why does Armadillo display "gmm_diag::learn(): no existing means" and fail to learn the model?
PROBLEM DETAILS:
I am working on a machine learning algorithm, the aim of which is to identify a writer from their handwriting. I am using supervised learning to train our model with a GMM.
All the training data is read from XML files. After calculating the features, their values are stored into a linked list. After this, the number of elements in the list is counted and is used to initialize an Armadillo mat(rix) variable at runtime as shown below:
int totFeatureVectors = CountPointClusterElements(TRAINING_CLUSTER_LIST_INDEX);
printf("\n%d elements added to list\n",totFeatureVectors);
mat F = mat(NUM_POINT_BASED_FEATURES, totFeatureVectors, fill::zeros);
Here TRAINING_CLUSTER_LIST_INDEX and NUM_POINT_BASED_FEATURES are a couple of configurable, project level constants; for my program NUM_POINT_BASED_FEATURES = 5 and totFeatureVectors = 72580. So the variable F is a [5 x 72580] dimensional matrix of double values. After initialization, I am reading the feature values from the linked list into F as follows:
int rowInd=0, colInd=0;
PointClusterElement *iterator = allClusterPointsList;
while(iterator!=NULL)
{
F(rowInd,colInd)=iterator->pointSample.speed;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.sinComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.sinComponent;
rowInd += 1;
if(rowInd==NUM_POINT_BASED_FEATURES)
{
rowInd=0;
colInd += 1;
}
iterator=iterator->nextClusterElement;
}
The assignment of feature values to locations in F is being made in a column major manner i.e. each column of F represents a feature vector post assignment. I am even writing the values of F into a text file to verify that all the feature values have been properly set and yes, it is happening without any problems
FILE *fp = fopen(PROGRAM_DATA_OUTPUT_PATH,"w");
if(fp!=NULL)
{
int r,c;
for(c=0; c<totFeatureVectors; c++)
{
for(r=0; r<NUM_POINT_BASED_FEATURES; r++)
{
fprintf(fp,"%lf\t",F(r,c));
}
fprintf(fp,"\n");
}
}
fclose(fp);
So far, so good. But after this, when I declare a gmm_diag variable and try to fit a GMM to F using its learn() method, the program displays a warning "gmm_diag::learn(): no existing means" and quits, thus failing to learn the GMM (here the VARIANCE_FLOORING_FACTOR = 0.001)
gmm_diag writerModel;
bool result = writerModel.learn(F, 20, maha_dist, random_subset, 100, 100, VARIANCE_FLOORING_FACTOR, true);
writerModel.dcovs.print("covariances:\n");
writerModel.hefts.print("weights:\n");
writerModel.means.print("means:\n");
if(result==true)
{
printf("\nModel learnt");
}
else if(result==false)
{
printf("\nModel not learnt");
}
I opened up the learn() method on my IDE and from what I could make out, this error (warning) message is displayed only when the initial seeding mode is keep_existing. The source file I referred to is at /usr/include/armadillo_bits/gmm_diag_meat.hpp
My question is - why would this happen even when my seeding is done using the random_subset mode? How exactly should I proceed to get my model to learn? Not sure what I am missing here... The documentation and code samples provided at http://arma.sourceforge.net/docs.html#gmm_diag were not too helpful (the short program here works even without initializing the means of the GMM). The code is given below
int main(int argc, char** argv) {
int totFeatureVectors = CountPointClusterElements(TRAINING_CLUSTER_LIST_INDEX);
printf("\n%d elements added to list\n",totFeatureVectors);
mat F = mat(NUM_POINT_BASED_FEATURES, totFeatureVectors, fill::zeros);
int rowInd=0, colInd=0;
PointClusterElement *iterator = allClusterPointsList;
while(iterator!=NULL)
{
F(rowInd,colInd)=iterator->pointSample.speed;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.dirn.sinComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.cosComponent;
rowInd += 1;
F(rowInd,colInd)=iterator->pointSample.curv.sinComponent;
rowInd += 1;
if(rowInd==NUM_POINT_BASED_FEATURES)
{
rowInd=0;
colInd += 1;
}
iterator=iterator->nextClusterElement;
}
FILE *fp = fopen(PROGRAM_DATA_OUTPUT_PATH,"w");
if(fp!=NULL)
{
int r,c;
for(c=0; c<totFeatureVectors; c++)
{
for(r=0; r<NUM_POINT_BASED_FEATURES; r++)
{
fprintf(fp,"%lf\t",F(r,c));
}
fprintf(fp,"\n");
}
}
fclose(fp);
gmm_diag writerModel;
bool result = writerModel.learn(F, 20, maha_dist, random_subset, 100, 100, VARIANCE_FLOORING_FACTOR, true);
writerModel.dcovs.print("covariances:\n");
writerModel.hefts.print("weights:\n");
writerModel.means.print("means:\n");
if(result==true)
{
printf("\nModel learnt");
}
else if(result==false)
{
printf("\nModel not learnt");
}
getchar();
return 0;}
TECHNICAL DETAILS:
The program is being run on a Ubuntu 14.04 OS using a Netbeans 8.0.2 IDE. The project is a C/C++ application
Any help would be most appreciated! Thanks in advance
~ Sid
You need to try the simplest possible case first, in order to narrow down the location of the bug. Your code is certainly not simple, and it's also not reproducible (nobody except you has all the functions).
The following simple code works, which suggests that the bug is somewhere else in your code.
I suspect your code is overwriting memory somewhere, leading to data and/or code corruption. The bug is probably an incorrect pointer, or incorrectly used pointer.
#include <fstream>
#include <armadillo>
using namespace std;
using namespace arma;
int main(int argc, char** argv) {
mat F(5,72580, fill::randu);
gmm_diag model;
bool result = model.learn(F, 20, maha_dist, random_subset, 100, 100, 0.001, true);
model.hefts.print("hefts:");
model.means.print("means:");
model.dcovs.print("dcovs:");
return 0;
}
Output from above code:
gmm_diag::learn(): generating initial means
gmm_diag::learn(): k-means: iteration: 1 delta: 0.343504
gmm_diag::learn(): k-means: iteration: 2 delta: 0.0528804
...
gmm_diag::learn(): k-means: iteration: 100 delta: 3.02294e-06
gmm_diag::learn(): generating initial covariances
gmm_diag::learn(): EM: iteration: 1 avg_log_p: -0.624274
gmm_diag::learn(): EM: iteration: 2 avg_log_p: -0.586567
...
gmm_diag::learn(): EM: iteration: 100 avg_log_p: -0.472182
hefts:
0.0915 0.0335 0.0308 ...
means:
0.4677 0.1230 0.8582 ...
...
dcovs:
0.0474 0.0059 0.0080 ...
...
That branch can only be taken in the Armadillo code when seed_mode is equal to keep_existing, and the means matrix is empty. Why not simply do source-level debugging using your IDE and see at what point it's changing out from under you?
I started trying to use rcpp to improve the speed of a for loop in R where each iteration depends on the previous (i.e. no easy vectorization). My current code (below) is a bit faster than R but no nearly as fast as I would have thought. Any glaring inefficiencies in the code below that someone can spot? Any general (or specific) advice would be helpful.
UpdateInfections <- cxxfunction(signature(pop ="data.frame",inds="integer",alpha="numeric",t="numeric"), '
DataFrame DF(pop);
IntegerVector xinds(inds);
NumericVector inf_time = DF["inf.time"];
IntegerVector loc = DF["loc"] ;
IntegerVector Rind = DF["R.indiv"] ;
NumericVector infector = DF["infector"] ;
IntegerVector vac = DF["vac"] ;
NumericVector wts(loc.size());
double xt = Rcpp::as<double>(t);
double xalpha = Rcpp::as<double>(alpha);
RNGScope scope; // Initialize Random number generator
Environment base("package:base");
Function sample = base["sample"];
int n = loc.size();
int i;int j;int k;
int infsize = xinds.size();
for (i=0;i<infsize;i++) {
int infpoint = xinds[i]-1;
NumericVector inf_times_prop(Rind[infpoint]);
NumericVector inf_me(Rind[infpoint]);
for (j=0; j<n;j++){
if (j == infpoint){
wts[j] = 0.0;
} else if (loc[j] == loc[infpoint]){
wts[j] = 1.0;
} else {
wts[j] = xalpha;
}
}
inf_me = sample(n,Named("size",Rind[infpoint]),Named("prob",wts));
//Note that these will be shifted by one
for (k=0;k<Rind[infpoint];k++){
inf_times_prop[k] = floor(::Rf_rlnorm(1.6,.6) + 0.5 + xt);
if (inf_times_prop[k] < inf_time[inf_me[k]-1] && vac[inf_me[k]-1] == 0){
inf_time[inf_me[k]-1] = inf_times_prop[k];
infector[inf_me[k]-1] = inf_me[k];
}
}
}
// create a new data frame
Rcpp::DataFrame NDF =
Rcpp::DataFrame::create(Rcpp::Named("inf.time")=inf_time,
Rcpp::Named("loc")=loc,
Rcpp::Named("R.indiv")=Rind,
Rcpp::Named("infector")=infector,
Rcpp::Named("vac")=vac);
return(NDF);
' , plugin = "Rcpp" )
We're actually working on a pure C++ sample function for RcppArmadillo right now. Take a look here http://permalink.gmane.org/gmane.comp.lang.r.rcpp/4179 or here http://permalink.gmane.org/gmane.comp.lang.r.rcpp for updates.
You are calling back to R. That cannot be as fast a pure C++ solution.
Your example is also long, too long. I recommend profiling and optimizing individual pieces. There is, alas, still no entirely free lunch.
Hey, so basically I have this issue, where I'm trying to put an equation inside of a function however it doesn't seem to set the value to the function and instead doesn't change it at all.
This is a predator prey simulation and I have this code inside of a for loop.
wolves[i+1] = ((1 - wBr) * wolves[i] + I * S * rabbits[i] * wolves[i]);
rabbits[i+1] = (1 + rBr) * rabbits[i] - I * rabbits[i] * wolves[i];
When I execute this, it works as intended and changes the value of both of these arrays appropriately, however when I try to put it inside of a function,
int calcRabbits(int R, int rBr, int I, int W)
{
int x = (1 + rBr) * R - I * R * W;
return x;
}
int calcWolves(int wBr, int W, int I, int S, int R)
{
int x = ((1 - wBr) * W + I * S * R * R);
return x;
}
And set the values as such
rabbits[i+1] = calcRabbits ( rabbits[i], rBr, I, wolves[i]);
wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
The values remain the same as they were when they were initialized and it doesn't seem to work at all, and I have no idea why. I have been at this for a good few hours and it's probably something that I'm missing, but I can't figure it out.
Any and all help is appreciated.
Edit: I realized the parameters were wrong, but I tried it before with the correct parameters and it still didnt work, just accidentally changed it to the wrong parameters (Compiler mouse-over was showing the old version of the parameters)
Edit2: The entire section of code is this
days = getDays(); // Runs function to get Number of days to run the simulation for
dayCycle = getCycle(); // Runs the function get Cycle to get the # of days to mod by
int wolves[days]; // Creates array wolves[] the size of the amount of days
int rabbits[days]; // Creates array rabbits [] the size of the amount of days
wolves[0] = W; // Sets the value of the starting number of wolves
rabbits[0] = R; // sets starting value of rabbits
for(int i = 0; i < days; i++) // For loop runs the simulation for the number of days
{
// rabbits[i+1] = calcRabbits ( rabbits[i], rBr, I, wolves[i]);
// // //This is the code to change the value of both of these using the function
// wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
// This is the code that works and correctly sets the value for wolves[i+1]
wolves[i+1] = calcWolves(wBr, wolves[i], I, S, rabbits[i]);
rabbits[i+1] = (1 + rBr) * rabbits[i] - I * rabbits[i] * wolves[i];
}
Edit: I realized my mistake, I was putting rBr and wBr in as ints, and they were floats which were numbers that were below 1, so they were being automatically converted to be 0. Thanks sje
Phil I cannot see anything evidently wrong in your code.
My hunch is that your are messing up the parameters.
Using gdb at this point would be an over kill. I recommend you put print outs in calcRabbits and calcWolves. Print out all the parameters, the new value, and the iteration number. That will give you a good idea of what is going on and will help trace the problem.
Do you have the full code with initialization we could try to test and run?
I'm not sure this is the problem, but this is bad:
int wolves[days]; // Creates array wolves[] the size of the amount of days
int rabbits[days]; // Creates array rabbits [] the size of the amount of days
days is determined at runtime. This is nonstandard in c++ (and for large number of days could destroy your stack) you should only be using constants in array sizes. You can dynamically size a vector to workaround this limitation (or heap allocate the array).
Change to this:
std::vector<int> wolves(days);
std::vector<int> rabbits(days);
Or to this:
int *wolves = new int[days];
int *rabbits = new int[days];
// all your code goes here
delete [] wolves; // when you're done
delete [] rabbits; // when you're done
Which will dynamically allocate the array on the heap. The rest of the code should work the same.
Don't forget to #include <vector>, if you use the vector approach.
If you're still having problems, I would cout << "Days: " << days << endl; to make sure you're getting the right number back from getDays(). If you got zero, it would seem to manifest itself in "the loop not working".
I was using an integer as an argument for a double.