I have been struggling for a few days now to figure out how to port code from MatLab or Python to C++...
My problem is with porting a Non-Linear Least Squares optimization function and I have basically hit a roadblock since everything so far is working perfectly.
MathLab implementation: https://github.com/SuwoongHeo/Deformation-Transfer-Matlab
Python implementation: https://github.com/ziyeshanwai/python-deformation-transfer
The exact bit of code I`m having trouble porting ...
The MatLab Code is:
[b, res, resi] = lsqnonlin(#(b) resSimXform( b,double(A),double(B) ),(double(b0)),[],[],options);
function result = resSimXform( b,A,B )
r = b(1:4);
t = b(5:7);
s = b(8);
% s = b(8:10);
n = size(A,2);
if ~isreal(r)
a = 1;
end
R = vrrotvec2mat(r);
test =repmat(t', 1, n);
rot_A = diag(s) * R * A + repmat(t', 1, n);
result = sum(sum((B-rot_A).^2,2));
end
and the equivalent python code is:
b = least_squares(fun=resSimXform, x0=b0, jac='3-point', method='lm', args=(Points_A, Points_B),
ftol=1e-12, xtol=1e-12, gtol=1e-12, max_nfev=100000)
def resSimXform(b, A, B):
print("resSimXform function")
t = b[4:7]
R = np.zeros((3, 3))
R = R_axis_angle(R, b[0:3], b[3])
rot_A = b[7]*R.dot(A) + t[:, np.newaxis]
result = np.sqrt(np.sum((B-rot_A)**2, axis=0))
return result
I have tried various optimizing libraries in C++ with no luck... The way they work is apparently different from the ones in Python or Matlab.
I have tried Eigen's LevenbergMarquardt and DLibs solve_least_squares_lm but failed... I couldn't figure out how to set up them since apparently they are different from their counterparts.
I have already implemented the "Residual of similarity" function(resSimXform) in C++ and it works fine. I`ve been constantly debugging the results and checking against the output from MathLab and Python.
My version in c++ looks something like: ( I'll try to clean it up more when I figure out how the use it with a least squared function...but at the moment it returns the exact value as Matlab )
double residual(MatrixXd x, MatrixXd A, MatrixXd B) {
MatrixXd r(1, 4);
r(0, 0) = x(0, 0);
r(0, 1) = x(0, 1);
r(0, 2) = x(0, 2);
r(0, 3) = x(0, 3);
MatrixXd t(1, 3);
t(0, 0) = x(0, 4);
t(0, 1) = x(0, 5);
t(0, 2) = x(0, 6);
double s = x(0, 7);
MatrixXd R(3, 3);
R = R_axis_angle(r);
MatrixXd rot_A(A.rows(), A.cols());
t.transposeInPlace();
t = t.col(0).replicate(1, A.cols());
rot_A = s * R * A + t;
MatrixXd fvecM = B - rot_A;
for (int i = 0; i < fvecM.rows(); i++) {
for (int j = 0; j < fvecM.cols(); j++) {
fvecM(i, j) = pow(fvecM(i, j), 2);
}
}
Eigen::VectorXd sums(3);
sums(0) = fvecM.row(0).sum();
sums(1) = fvecM.row(1).sum();
sums(2) = fvecM.row(2).sum();
return fvecM.sum();
}
I would greatly appreciate it if anyone out there can help me figure out what optimization library for c++ would work in this scenario and how to implement it... if there are vast differences from the ones in Matlab or Python since my experience with this sort of optimizers is limited.
Thanks in advance...
I have already tried googling for a couple of days and couldn't find a solution so that's why I'm here :)
Figured it out with Eigen`s not supported LevenbergMarquardt. You build a functor... my_functor(void): Functor(8, 8) {} .. If you want to solve 8 values... it will have to be 8x8 ...
The operator function... is basically the residual of similarity...
also if your functor function returns only one value.. you need to pass it 8 times otherwise it will not converge if you just pass it like fvec(0) = result
for (int i = 0; i < x.rows(); i++) {
fvec(i) = sum;
}
And then you use the NumericalDiff mode...
Eigen::NumericalDiff<my_functor> numDiff(functor);
Eigen::LevenbergMarquardt<Eigen::NumericalDiff<my_functor>, double> lm(numDiff);
Hope this helps someone...Cheers
Related
So I have this Rcpp function in a .cpp file. You'll see that it is calling other custom functions that I don't show for simplicity, but those don't show any problem whatsoever.
// [[Rcpp::export]]
int sim_probability(float present_wealth , int time_left, int n, float mu, float sigma, float r, float gamma, float gu, float gl){
int i;
int count = 0;
float final_wealth;
NumericVector y(time_left);
NumericVector rw(time_left);
for(i=0;i<n;i++){
rw = random_walk(time_left, 0);
y = Y(rw, mu, sigma, r, gamma);
final_wealth = y[time_left-1] - y[0] + present_wealth;
if(final_wealth <= gu && final_wealth >= gl){
count = count + 1;
}
}
return count;
}
Then I can call this function from a .R seamlessly:
library(Rcpp)
sourceCpp("functions.cpp")
sim_probability(present_wealth = 100, time_left = 10, n = 1e3, mu = 0.05, sigma = 0.20, r = 0, gamma = 2, gu = 200, gl = 90)
But, if I call it inside a for loop, no matter how small it is, R crashes without popping any apparent error. The chunk below would make R crash.
for(l in 1:1){
sim_probability(present_wealth = 100, time_left = 10, n = 1e3, mu = 0.05, sigma = 0.20, r = 0, gamma = 2, gu = 200, gl = 90)
}
I've also tried to execute it manually (Ctrl + Enter) many times as fast as I could, and I'm fast enough it also crashes.
I have tried smaller or bigger loops, both out and within the function. It also crashes if it's called from another Rcpp function. I know I shouldn't call Rcpp functions in a R loop. Eventually I intend to call it from another Rcpp function (to generate a matrix of data) but it crashes all the same.
I have followed other cases that I've found googling and tried a few things, as changing to [] brackets for the arrays' index (this question), playing with the gc() garbage collector (as suggested here).
I suspected that something happened with the NumericVector definitions. But as far as I can tell they are declared properly.
It is been fairly pointed out in the comments that this is not a reproducible exaxmple. I'll add down here the missing functions Y() and random_walk():
// [[Rcpp::export]]
NumericVector Y(NumericVector path, float mu, float sigma, float r, float gamma){
int time_step, n, i;
time_step = 1;
float theta, y0, prev, inc_W;
theta = (mu - r) / sigma;
y0 = theta / (sigma*gamma);
n = path.size();
NumericVector output(n);
for(i=0;i<n;i++){
if(i == 0){
prev = y0;
inc_W = path[0];
}else{
prev = output[i-1];
inc_W = path[i] - path[i-1];
}
output[i] = prev + (theta / gamma) * (theta * time_step + inc_W);
}
return output;
}
// [[Rcpp::export]]
NumericVector random_walk(int length, float starting_point){
if(length == 1){return starting_point;}
NumericVector output(length);
output[1] = starting_point;
int i;
for(i=0; i<length; i++){output[i+1] = output[i] + R::rnorm(0,1);}
return output;
}
Edit1: Added more code so it is reproducible.
Edit2: I was assigning local variables when calling the functions. That was dumb from my part, but harmless. The same error still persists. But I've fixed that.
Edit3: As it's been pointed out by Dirk in the comments, I was doing a pointless exercise redefining the rnorm(). Now it's removed and fixed.
The answer has been solved in the comments, by #coatless. I put it here to keep it for future readers. The thing is that the random_walk() function wasn't properly set up correctly.
The problem was that the loop inside the function allowed i to go out of the defined dimension of the vector output. This is just inefficient when called once, yet it works. But it blows up when it's called many times real fast.
So in order to avoid this error and many others, the function should have been defined as
// [[Rcpp::export]]
NumericVector random_walk(int length, float starting_point){
if(length == 0){return starting_point;}
NumericVector output(length);
output[0] = starting_point;
int i;
for(i=0; i<length-1; i++){output[i+1] = output[i] + R::rnorm(0,1);}
return output;
}
I am using the Xtensor library for C++.
I have a xt::zeros({n, n, 3}) array and I would like to assign the its i, j, element an xt::xarray{ , , } so that it would store a 3D dimensional vector at each (i, j). However the documentation does not mention assigning values - I am in general unable to figure out from the documentation how arrays with multiple coodinates works.
What I have been trying is this
xt::xarray<double> force(Body body1, Body body2){
// Function to calulate the vector force on body2 from
// body 1
xt::xarray<double> pos1 = body1.get_position();
xt::xarray<double> pos2 = body2.get_position();
// If the positions are equal return the zero-vector
if(xt::all(xt::equal(pos1, pos2))) {
return xt::zeros<double>({1, 3});
}
xt::xarray<double> r12 = pos2 - pos1;
double dist = xt::linalg::norm(r12);
return -6.67259e-11 * body1.get_mass() * body2.get_mass()/pow(dist, 3) * r12;
}
xt::xarray <double> force_matrix(){
// Initialize the matrix that will hold the force vectors
xt::xarray <double> forces = xt::zeros({self_n, self_n, 3});
// Enter the values into the force matrix
for (int i = 0; i < self_n; ++i) {
for (int j = 0; j < self_n; ++j)
forces({i, j}) = force(self_bodies[i], self_bodies[j]);
}
}
Where I'm trying to assign the output of the force function as the ij'th coordinate in the forces array, but that does not seem to work.
In xtensor, assigning and indexing into multidimensional arrays is quite simple. There are two main ways:
Either index with round brackets:
xarray<double> a = xt::zeros({3, 3, 5});
a(0, 1, 3) = 10;
a(1, 1, 0) = -100; ...
or by using the xindex type (which is a std::vector at the moment), and the square brackets:
xindex idx = {0, 1, 3};
a[idx] = 10;
idx[0] = 1;
a[idx] = -100; ...
Hope that helps.
You can also use view to achieve that.
In the inner loop, you could do:
xt::view(forces, i, j, xt::all()) = a_xarray_with_proper_size;
I am new to CPLEX Python API. I wish to solve a Linear Programming problem in python which I have already done in the CPLEX OPL IDE by taking a .mod and .dat files as inputs. I want to use it in python since I wish to vary my inputs continuously. My mod file for the problem is given below. Can someone help me on how to use this for the python API.
int n = ...;
int m = ...;
int c = ...;
int s = ...;
range v = 1..n;
range p = 1..m;
int c_req[v] = ...;
int s_req[v] = ...;
int trust[v][v] = ...;
// decision variables
dvar boolean assign[p][v];
// expressions
dexpr int used[pi in p] = max(vi in v) assign[pi][v]; // used[i] = 1 iff pi is used
dexpr int totalUsed = sum(pi in p) used[pi];
execute {
cplex.tilim = 60; // Time limit 60 seconds
}
// model
minimize totalUsed;
subject to {
forall(pi in p)
c_cap:
sum(vi in v) c_req[vi] * assign[pi][vi] <= c;
forall(pi in p)
s_cap:
sum(vi in v) s_req[vi] * assign[pi][vi] <= s;
forall(vi in v)
v_all:
sum(pi in p) assign[pi][vi] == 1;
forall(pi in p, v1 in v, v2 in v) if (v1 < v2) if (trust[v1][v2] == 0)
trust_constraint:
assign[p][v1] + assign[p][v2] <= 1;
}
you could write
subprocess.check_call(["C:/CPLEXStudio127/opl/bin/x64_win64/oplrun", "diet.mod", "diet.dat"])
in order to call OPL from python. And you would generate diet.dat beforehand.
Full example at https://www.ibm.com/developerworks/community/forums/html/threadTopic?id=0b6cacbe-4dda-4da9-9282-f527c3464f47
Then you do not have to migrate your model from OPL to Python.
You may also translate your model to Python and then I recommend DOCPLEX : https://developer.ibm.com/docloud/documentation/optimization-modeling/modeling-for-python/
regards
On Windows 10, running Visual Studio 2015. Opencv 3.0
Using Opencv to first correlate two images and determine translation between them using matchTemplate. I want to get subpixel estimate so I am going to input an 11X11 window of values from the correlation output and fit a quadratic surface to those points.
void Sector1::ResampSector(cv::Mat In, cv::Mat R, cv::Mat Out, cv::Point Loc)
{
// first get fractional offset
int lsq = 5;
// Ax^2 + B xy + Cy^2 + Dx +Ey + F = R
cv::setBreakOnError(true);
cv::Mat A( 121, 6, CV_32F);
cv::Mat B( 121, 1, CV_32F);
cv::Mat C (6, 1, CV_32F);
int L = 0;
for (int i = Loc.y-lsq; i <= Loc.y+lsq; i++) {
for (int j = Loc.x-lsq; j <= Loc.x+lsq; j++) {
A.at<float>(L, 0) = float(i*i);
A.at<float>(L, 1) = (float)i*j;
A.at<float>(L, 2) = (float)j*j;
A.at<float>(L, 3) = (float)i;
A.at<float>(L, 4) = (float)j;
A.at<float>(L, 5) = 1.f;
B.at<float>(L) = R.at<float>(i, j); // since is 3 band stuff ?
L++;
} // for j
} // for i
bool rc = cv::solve(A, B, C);
the call to cv::solve returns false and there are two cv::Exceptions at same address which is outside of any of the image matrices or other variables. I have looked at the contents of A, B and C using memory window and they all appear correct. A,B,C structures all appear correct. I have tried to step into solve but i do not have the library with symbolic tables.
Any clue where i have gone wrong? suggestions for further tracking the problem?
Lapack complains that the default method will not work. correction is to add the flag=DECOMP_QR as the 4th, optional, arguement to the call to solve()
Let say, A and B are matrices of the same size.
In Matlab, I could use simple indexing as below.
idx = A>0;
B(idx) = 0
How can I do this in OpenCV? Should I just use
for (i=0; ... rows)
for(j=0; ... cols)
if (A.at<double>(i,j)>0) B.at<double>(i,j) = 0;
something like this? Is there a better (faster and more efficient) way?
Moreover, in OpenCV, when I try
Mat idx = A>0;
the variable idx seems to be a CV_8U matrix (not boolean but integer).
You can easily convert this MATLAB code:
idx = A > 0;
B(idx) = 0;
// same as
B(A>0) = 0;
to OpenCV as:
Mat1d A(...)
Mat1d B(...)
Mat1b idx = A > 0;
B.setTo(0, idx) = 0;
// or
B.setTo(0, A > 0);
Regarding performance, in C++ it's usually faster (it depends on the enabled optimizations) to work on raw pointers (but is less readable):
for (int r = 0; r < B.rows; ++r)
{
double* pA = A.ptr<double>(r);
double* pB = B.ptr<double>(r);
for (int c = 0; c < B.cols; ++c)
{
if (pA[c] > 0.0) pB[c] = 0.0;
}
}
Also note that in OpenCV there isn't any boolean matrix, but it's a CV_8UC1 matrix (aka a single channel matrix of unsigned char), where 0 means false, and any value >0 is true (typically 255).
Evaluation
Note that this may vary according to optimization enabled with OpenCV. You can test the code below on your PC to get accurate results.
Time in ms:
my results my results #AdrienDescamps
(OpenCV 3.0 No IPP) (OpenCV 2.4.9)
Matlab : 13.473
C++ Mask: 640.824 5.81815 ~5
C++ Loop: 5.24414 4.95127 ~4
Note: I'm not entirely sure about the performance drop with OpenCV 3.0, so I just remark: test the code below on your PC to get accurate results.
As #AdrienDescamps stated in comments:
It seems that the performance drop with OpenCV 3.0 is related to the OpenCL option, that is now enabled in the comparison operator.
C++ Code
#include <opencv2/opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;
int main()
{
// Random initialize A with values in [-100, 100]
Mat1d A(1000, 1000);
randu(A, Scalar(-100), Scalar(100));
// B initialized with some constant (5) value
Mat1d B(A.rows, A.cols, 5.0);
// Operation: B(A>0) = 0;
{
// Using mask
double tic = double(getTickCount());
B.setTo(0, A > 0);
double toc = (double(getTickCount()) - tic) * 1000 / getTickFrequency();
cout << "Mask: " << toc << endl;
}
{
// Using for loop
double tic = double(getTickCount());
for (int r = 0; r < B.rows; ++r)
{
double* pA = A.ptr<double>(r);
double* pB = B.ptr<double>(r);
for (int c = 0; c < B.cols; ++c)
{
if (pA[c] > 0.0) pB[c] = 0.0;
}
}
double toc = (double(getTickCount()) - tic) * 1000 / getTickFrequency();
cout << "Loop: " << toc << endl;
}
getchar();
return 0;
}
Matlab Code
% Random initialize A with values in [-100, 100]
A = (rand(1000) * 200) - 100;
% B initialized with some constant (5) value
B = ones(1000) * 5;
tic
B(A>0) = 0;
toc
UPDATE
OpenCV 3.0 uses IPP optimization in the function setTo. If you have that enabled (you can check with cv::getBuildInformation()), you'll have a faster computation.
The answer of Miki is very good, but i just want to add some clarification about the performance problem to avoid any confusion.
It is true that the best way to implement an image filter (or any algorithm) with OpenCV is to use the raw pointers, as shown in the second C++ example of Miki (C++ Loop).
Using the at function is also correct, but significantly slower.
However, most of the time, you don't need to worry about that, and you can simply use the high level functions of OpenCV (first example of Miki , C++ Mask). They are well optimized, and will usually be almost as fast as a low level loop on pointers, or even faster.
Of course, there are exceptions (we just found one), and you should always test for your specific problem.
Now, regarding this specific problem :
The example here where the high level function was much slower (100x slower) than the low level loop is NOT a normal case, as it is demonstrated by the timings with other version/configuration of OpenCV, that are much lower.
The problem seems to be that when OpenCV3.0 is compiled with OpenCL, there is a huge overhead the first time a function that uses OpenCL is called. The simplest solution is to disable OpenCL at compile time, if you use OpenCV3.0 (see also here for other possible solutions if you are interested).