I am trying to replicate Matlab's Fsolve as my project is in C++ solving an implicit RK4 scheme. I am using the NLopt library using the NLOPT_LD_MMA algorithm. I have run the required section in matlab and it is considerably faster. I was wondering whether anyone had any ideas of a better Fsolve equivalent in C++? Another reason is that I would like f1 and f2 to both tend to zero and it seems suboptimal to calculate the L2 norm to include both of them as NLopt seems to only allow a scalar return value from the objective function. Does anyone have any ideas of an alternative library or perhaps using a different algorithm/constraints to more closely replicate the default fsolve.
Would it be better (faster) perhaps to call the python scipy.minimise.fsolve from C++?
double implicitRK4(double time, double V, double dt, double I, double O, double C, double R){
const int number_of_parameters = 2;
double lb[number_of_parameters];
double ub[number_of_parameters];
lb[0] = -999; // k1 lb
lb[1] = -999;// k2 lb
ub[0] = 999; // k1 ub
ub[1] = 999; // k2 ub
double k [number_of_parameters];
k[0] = 0.01;
k[1] = 0.01;
kOptData addData(time,V,dt,I,O,C,R);
nlopt_opt opt; //NLOPT_LN_MMA NLOPT_LN_COBYLA
opt = nlopt_create(NLOPT_LD_MMA, number_of_parameters);
nlopt_set_lower_bounds(opt, lb);
nlopt_set_upper_bounds(opt, ub);
nlopt_result nlopt_remove_inequality_constraints(nlopt_opt opt);
// nlopt_result nlopt_remove_equality_constraints(nlopt_opt opt);
nlopt_set_min_objective(opt,solveKs,&addData);
double minf;
if (nlopt_optimize(opt, k, &minf) < 0) {
printf("nlopt failed!\n");
}
else {
printf("found minimum at f(%g,%g,%g) = %0.10g\n", k[0],k[1],minf);
}
nlopt_destroy(opt);
return V + (1/2)*dt*k[0] + (1/2)*dt*k[1];```
double solveKs(unsigned n, const double *x, double *grad, void *my_func_data){
kOptData *unpackdata = (kOptData*) my_func_data;
double t1,y1,t2,y2;
double f1,f2;
t1 = unpackdata->time + ((1/2)-(1/6)*sqrt(3));
y1 = unpackdata->V + (1/4)*unpackdata->dt*x[0] + ((1/4)-(1/6)*sqrt(3))*unpackdata->dt*x[1];
t2 = unpackdata->time + ((1/2)+(1/6)*sqrt(3));
y2 = unpackdata->V + ((1/4)+(1/6)*sqrt(3))*unpackdata->dt*x[0] + (1/4)*unpackdata->dt*x[1];
f1 = x[0] - stateDeriv_implicit(t1,y1,unpackdata->dt,unpackdata->I,unpackdata->O,unpackdata->C,unpackdata->R);
f2 = x[1] - stateDeriv_implicit(t2,y2,unpackdata->dt,unpackdata->I,unpackdata->O,unpackdata->C,unpackdata->R);
return sqrt(pow(f1,2) + pow(f2,2));
My matlab version below seems to be a lot simpler but I would prefer the whole code in c++!
k1 = 0.01;
k2 = 0.01;
x0 = [k1,k2];
fun = #(x)solveKs(x,t,z,h,I,OCV1,Cap,Rct,static);
options = optimoptions('fsolve','Display','none');
k = fsolve(fun,x0,options);
% Calculate the next state vector from the previous one using RungeKutta
% update equation
znext = z + (1/2)*h*k(1) + (1/2)*h*k(2);``
function [F] = solveKs(x,t,z,h,I,O,C,R,static)
t1 = t + ((1/2)-(1/6)*sqrt(3));
y1 = z + (1/4)*h*x(1) + ((1/4)-(1/6)*sqrt(3))*h *x(2);
t2 = t + ((1/2)+(1/6)*sqrt(3));
y2 = z + ((1/4)+(1/6)*sqrt(3))*h*x(1) + (1/4)*h*x(2);
F(1) = x(1) - stateDeriv_implicit(t1,y1,h,I,O,C,R,static);
F(2) = x(2) - stateDeriv_implicit(t2,y2,h,I,O,C,R,static);
end
Related
I have two programs that should be identical but are giving different results. One in Mathematica (giving the correct result) and one in C++ (incorrect).
First the Mathematica:
q = 0.002344;
s = 0.0266;
v = 0.0744;
a = -q*PDCx^2;
b = s*PDCx - 2*q*PCLx*PDCx - PDCz;
c = -1*(PCLz + q*PCLx^2 - s*PCLx + v);
d = b*b - (4*a*c);
t = (-b + Sqrt[d])/(2*a)
Now the C++:
long double q = 0.002344;
long double s = 0.0266;
long double v = 0.0744;
long double a = -q * pow(PDCx, 2);
long double b = s * PDCx - 2 * q*PCLx*PDCx - PDCz;
long double c = (-1.0)*(PCLz + q * pow(PCLx, 2) - s * PCLx + v);
long double d = b * b - 4.0 * a*c;
t = (-b + sqrtf(d))/(2.0*a);
with
long double PCLx = -1.816017;
long double PCLz = 0.056013;
long double PDCx = 0.005073;
long double PDCz = -0.998134;
for each case. The Mathematica result is t = 0.1867646081 and C++ result is t = 0.124776. This is the "plus" solution of the quadratic. The minus solutions differ are 16549276.47723365 and 16549276.539223, respectively. I suspect that I am allowing the C++ result to be rounded incorrectly.
What is wrong with this iteration?
This particular piece of code is causing my program to crash. When I disable the code it works but of course giving wrong results. It's supposed to compare sigma with sigma_last until they remain equal at e-14.
This is what I tried first:
long double sigma_last = NULL;
do{
if(sigma_last != NULL){
sigma = sigma_last;
}
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
*(-1+2*pow(cos(2*sigmaM),2)))-(1/6)*B*cos(2*sigmaM)
*(-3+4*pow(sin(sigma),2))*(-3+4*pow(cos(2*sigmaM),2)));
sigma_last = sigma + d_sigma;
}
while(set_precision_14(sigma)<= set_precision_14(sigma_last) || set_precision_14(sigma)>= set_precision_14(sigma_last));
Then I tried using a pointer (desperately):
long double *sigma_last;
*sigma_last = NULL;
do{
if(*sigma_last != NULL){
sigma = *sigma_last;
}
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
*(-1+2*pow(cos(2*sigmaM),2)))-(1/6)*B*cos(2*sigmaM)
*(-3+4*pow(sin(sigma),2))*(-3+4*pow(cos(2*sigmaM),2)));
*sigma_last = sigma + d_sigma;
}
while(set_precision_14(sigma)<= set_precision_14(*sigma_last) || set_precision_14(sigma)>= set_precision_14(*sigma_last));
Finding the source of error in entire code and trying to solve it took me hours, cannot really come up with another "maybe this?" . Feel free to smite me.
Here's a github link to my full code if anyone out there's interested.
Your first (and only) iteration, sigma_last will be null, resulting in crash:
*sigma_last = NULL; // <-- dereferencing uninitialized ptr here
if(*sigma_last != NULL) { // <-- dereferencing uninitialized ptr here too
and if that would have been fixed, here:
*sigma_last == sigma + d_sigma;
This is because you have not set sigma_last to point to some valid floating-point space in memory. There doesn't seem to be any point to using a pointer in this particular case, so if I were you, I'd drop it and use a normal long double instead, as in your first attempt.
In your first example you assign NULL, which is really the value zero, to sigma_last. If zero is not what you're intending, you could either go with a value that most certainly will be out of range (say 1e20 and then compare to say < 1e19) or keep a separate boolan for the job. I personally prefer the first option:
long double sigma_last = 1e20;
...
if(sigma_last < 1e19){
sigma = sigma_last;
}
A better way still would be to use an infinite, or finite, loop and then break out at a certain condition. This will make the code easier to read.
Logic
Finally, you seem to have a problem with your logic in the while, since the comparison sigma <= sigma_last || sigma >= sigma_last is always true. It's always smaller, bigger, or equal.
sigma_last does not need to be a pointer. You just need to somehow flag its value to know whether it was already set or not. From your code I am not sure if we can use zero for this purpose, but we can use some constant (long double minimum value), like this one:
#include <float.h>
const long double invalid_constant = LDBL_MIN;
Try this:
long double DESTINATION_CALCULATION_plusplus ( double phi, double lambda, double S, double azimuth,
double a, double b, double *phi2, double* lambda2, double* azimuth2){
phi = phi*M_PI/180;
lambda = lambda*M_PI/180;
double A1;
double eu2 = (pow(a, 2) - pow(b, 2)) / pow(b, 2); //second eccentricity
double c = pow(a,2) / b;
double v = sqrt(1 + (eu2 * pow(cos(phi) , 2)));
double beta1 = tan(phi) / v;
double Aeq = asin( cos(beta1) * sin(azimuth) );
double f = (a - b) / a; //flattening
double beta = atan((1-f)*tan(phi));
double u2 = pow(cos(Aeq),2)*eu2;
//////////////////////////////----------------------------------------------
long double sigma1 = atan( tan(beta1)/ cos(azimuth) );
long double A = 1 + u2*(4096 + u2*(-768+u2*(320-175*u2))) / 16384;
long double B = u2*(256 + u2*(-128+u2*(74-47*u2)))/1024;
long double sigma = S / (b*A);
long double sigmaM = (2*sigma1 + sigma) /2;
long double d_w;
long double d_sigma;
////////////////////////////------------------------------------------------
double C;
double d_lambda;
long double sigma_last=invalid_constant;
do{
if(sigma_last != invalid_constant){
sigma = sigma_last;
}
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
*(-1+2*pow(cos(2*sigmaM),2)))-(1/6)*B*cos(2*sigmaM)
*(-3+4*pow(sin(sigma),2))*(-3+4*pow(cos(2*sigmaM),2)));
sigma_last = sigma + d_sigma;
}
while(set_precision_14(sigma)<= set_precision_14(sigma_last) || set_precision_14(sigma)>= set_precision_14(sigma_last));
sigma = sigma_last;
*phi2 = atan((sin(beta1)*cos(sigma)+cos(beta1)*sin(sigma)*cos(azimuth))/((1-f)
*sqrt(pow(sin(Aeq),2)+pow((sin(beta1)*sin(sigma)-cos(beta1)*cos(sigma)*cos(azimuth)),2))));
d_w = (sin(sigma)*sin(azimuth))/(cos(beta1)*cos(sigma) - sin(beta1)* sin(sigma)*cos(azimuth));
C = (f/16)*pow(cos(Aeq),2)*(4+f*(4-3*pow(cos(Aeq),2)));
d_lambda = d_w - (1-C)*f*sin(azimuth)*(sigma + C*sin(sigma)*
(cos(2*sigmaM)+C*cos(sigma)*(-1+2*pow(cos(2*sigmaM),2))));
*lambda2 = lambda + d_lambda;
*azimuth2 = sin(Aeq) / (-sin(beta1)*sin(sigma)+cos(beta1)*cos(sigma)*cos(azimuth));
*azimuth2 = *azimuth2 * 180/M_PI;
*lambda2 = *lambda2 * 180/M_PI;
*phi2 = *phi2 * 180/M_PI;
}
I have the following loop for a monte-carlo computation I am performing:
the variables below are pre-computed/populated and is defined as:
w_ = std::vector<std::vector<double>>(150000, std::vector<double>(800));
C_ = Eigen::MatrixXd(800,800);
Eigen::VectorXd a(800);
Eigen::VectorXd b(800);
The while loop is taking me about 570 seconds to compute.Just going by the the loops I understand that I have nPaths*m = 150,000 * 800 = 120,000,000 sets of computations happening (I have not taken into account the cdf computations handled by boost libraries).
I am a below average programmer and was wondering if there are any obvious mistakes which I am making which maybe slowing the computation down. Or is there any other way to handle the computation which can speed things up.
int N(0);
int nPaths(150000);
int m(800);
double Varsum(0.);
double err;
double delta;
double v1, v2, v3, v4;
Eigen::VectorXd d = Eigen::VectorXd::Zero(m);
Eigen::VectorXd e = Eigen::VectorXd::Zero(m);
Eigen::VectorXd f = Eigen::VectorXd::Zero(m);
Eigen::VectorXd y;
y0 = Eigen::VectorXd::Zero(m);
boost::math::normal G(0, 1.);
d(0) = boost::math::cdf(G, a(0) / C_(0, 0));
e(0) = boost::math::cdf(G, b(0) / C_(0, 0));
f(0) = e(0) - d(0);
while (N < (nPaths-1))
{
y = y0;
for (int i = 1; i < m; i++)
{
v1 = d(i - 1) + w_[N][(i - 1)]*(e(i - 1) - d(i - 1));
y(i - 1) = boost::math::quantile(G, v1);
v2 = (a(i) - C_.row(i).dot(y)) / C_(i, i);
v3 = (b(i) - C_.row(i).dot(y)) / C_(i, i);
d(i) = boost::math::cdf(G, v2);
e(i) = boost::math::cdf(G, v3);
f(i) = (e(i) - d(i))*f(i - 1);
}
N++;
delta = (f(m-1) - Intsum) / N;
Intsum += delta;
Varsum = (N - 2)*Varsum / N + delta*delta;
err = alpha_*std::sqrt(Varsum);
}
If I understand your code right, the running time is actually O(nPaths*m*m)=10^11, due to the dot-product C_.row(i).dot(y) which needs O(m) operation.
You could speed up the program by factor of two by not calculating it twice:
double prod=C_.row(i).dot(y)
v2 = (a(i) - prod) / C_(i, i);
v3 = (b(i) - prod) / C_(i, i);
but maybe compiler already does it for you.
The other thing is that y consists of zeros (at least at the beginning) so you don't have to do the full dot-product but only until current value of i. That should give another factor 2 speed up.
So taken into the account the sheer number of operation your timings are not so bad. There is some room for improvement of the code, but if you are interested in speeding up some orders of magnitude you probably should be thinking about changing your formulation.
I'm trying to find a root with simple fixed-point method by C++, but the point is that Xr is a root of f(x) and a inflection point as well. In addition, A equation is a little bit more complex than the normal Fixed-Point method.
The equation is added constant c for check how quickly converge to the root xr.
I was going to find a root and then check if the root is a inflection point or not, but it is not working and I can't find the problem in my code.
I need your help.
The real Problem is
Consider the root finding problem f(x)=0 with root xr, with f'(x)=0.
Convert it to the simple fixed-point problem.
x=x+c*f(x)=g(x)
with c a nonzero constant. How should c be chosen to ensure rapid convergence of
x(n+1)=x(n)+c*f(x(n)) ( x(n+1) means the value of the n+1th of X )
to c (provided that x0 is chosen sufficiently close to xr?). Apply your way of choosing c to the root-finding problem x*x*x-5=0. Start your program with x0=1.0 and run with several values of c and discuss about the observed trend in your results (in other words, the effect of c value on convergence behavior)
#include <stdio.h>
#include <conio.h>
#include <math.h>
#include <stdlib.h>
double gx(double x, double c)
{
return(x + c*(x*x*x - 5));
}
double gxpr(double x, double c)
{
return(x + c*(3 * x*x));
}
void Simple_Fixed_Point(double x, double c)
{
int i = 1;
long double x2=0.0;
long double x3=0.0;
long double ea=0.0;
long double ea2 = 0.0;
long double es = pow(10, -6);
printf("Simple Fixed Point Method\n");
Lbl:
x2 = gx(x,c);
printf("iteration=%d Root=%.5f Approximate error=%.15f\n", i++,
x2, ea);
if (ea=fabs((x2 - x)/x2*100) <es)
{
goto Lbm;
}
else
{
x = x2;
goto Lbl;
}
Lbm:
x3 = gxpr(x2, c);
if (ea2 = fabs((x3 - x2) / x3 * 100) < es)
{
goto End;
}
else
{
x2 = x3;
goto Lbm;
}
End:
getch();
}
int main(void)
{
Simple_Fixed_Point(1.0, 1.0);
return(0);
}
Hope this helps you:
//f(x+dx) = f(x) + (dfdx) * dx;
eps = 1.0;
dx = 1e-7; //something small
x = x0;
while (eps > mineps) {
f1 = f(x);
f2 = f(x + dx);
f3 = f(x + dx + dx);
d2fdx2 = (f3 - f2 - f2 + f1) / dx / dx;
dfdx = (f2 - f1) / dx;
x -= (relax1 * f1 / dfdx + relax2 * dfdx / d2fdx2); //relax - something less 1
eps = max(abs(dfdx), abs(f1));
}
I've recently implemented an image warping method using OpenCV 2.31(C++). However, this
method is quite time consuming... After some investigations and improvement
I succeed to reduce the processing time from 400ms to about 120ms which is quite nice.
I achieve this result by unrolling the loop (It reduces the time from 400ms to 330ms)
then I enabled optimization flags on my VC++ compiler 2008 express edition (Enabled O2 flag) - this last fix improved the processing to around 120ms.
However, since I have some other processing to implement around this warp, I'd like to reduce down this processing time even more to 20ms - lower than this value will be better of course, but I don't know if it's possible!!!...
One more thing, I'd like to do this using freely available libraries.
All suggestions are more than welcomed.
Bellow, you'll find the method I'm speaking about.
thanks for your help
Ariel B.
cv::Mat Warp::pieceWiseWarp(const cv::Mat &Isource, const cv::Mat &s, TYPE_CONVERSION type)
{
cv::Mat Idest(roi.height,roi.width,Isource.type(),cv::Scalar::all(0));
float xi, xj, xk, yi, yj, yk, x, y;
float X2X1,Y2Y1,X2X,Y2Y,XX1,YY1,X2X1_Y2Y1,a1, a2, a3, a4,b1,b2,c1,c2;
int x1, y1, x2, y2;
char k;
int nc = roi.width;
int nr = roi.height;
int channels = Isource.channels();
int N = nr * nc;
float *alphaPtr = alpha.ptr<float>(0);
float *betaPtr = beta.ptr<float>(0);
char *triMaskPtr = triMask.ptr<char>(0);
uchar *IdestPtr = Idest.data;
for(int i = 0; i < N; i++, IdestPtr += channels - 1)
if((k = triMaskPtr[i]) != -1)// the pixel do belong to delaunay
{
cv::Vec3b t = trianglesMap.row(k);
xi = s.col(1).at<float>(t[0]); yi = s.col(0).at<float>(t[0]);
xj = s.col(1).at<float>(t[1]); yj = s.col(0).at<float>(t[1]);
xk = s.col(1).at<float>(t[2]); yk = s.col(0).at<float>(t[2]);
x = xi + alphaPtr[i]*(xj - xi) + betaPtr[i]*(xk - xi);
y = yi + alphaPtr[i]*(yj - yi) + betaPtr[i]*(yk - yi);
//...some bounds checking here...
x2 = ceil(x); x1 = floor(x);
y2 = ceil(y); y1 = floor(y);
//2. use bilinear interpolation on the pixel location - see wiki for formula...
//...3. copy the resulting intensity (GL) to the destination (i,j)
X2X1 = (x2 - x1);
Y2Y1 = (y2 - y1);
X2X = (x2 - x);
Y2Y = (y2 - y);
XX1 = (x - x1);
YY1 = (y - y1);
X2X1_Y2Y1 = X2X1*Y2Y1;
a1 = (X2X*Y2Y)/(X2X1_Y2Y1);
a2 = (XX1*Y2Y)/(X2X1_Y2Y1);
a3 = (X2X*YY1)/(X2X1_Y2Y1);
a4 = (XX1*YY1)/(X2X1_Y2Y1);
b1 = (X2X/X2X1);
b2 = (XX1/X2X1);
c1 = (Y2Y/Y2Y1);
c2 = (YY1/Y2Y1);
for(int c = 0; c < channels; c++)// Consider implementing this bilinear interpolation elsewhere in another function
{
if(x1 != x2 && y1 != y2)
IdestPtr[i + c] = Isource.at<cv::Vec3b>(y1,x1)[c]*a1
+ Isource.at<cv::Vec3b>(y2,x1)[c]*a2
+ Isource.at<cv::Vec3b>(y1,x2)[c]*a3
+ Isource.at<cv::Vec3b>(y2,x2)[c]*a4;
if(x1 == x2 && y1 == y2)
IdestPtr[i + c] = Isource.at<cv::Vec3b>(y1,x1)[c];
if(x1 != x2 && y1 == y2)
IdestPtr[i + c] = Isource.at<cv::Vec3b>(y1,x1)[c]*b1 + Isource.at<cv::Vec3b>(y1,x2)[c]*b2;
if(x1 == x2 && y1 != y2)
IdestPtr[i + c] = Isource.at<cv::Vec3b>(y1,x1)[c]*c1 + Isource.at<cv::Vec3b>(y2,x1)[c]*c2;
}
}
if(type == CONVERT_TO_CV_32FC3)
Idest.convertTo(Idest,CV_32FC3);
if(type == NORMALIZE_TO_1)
Idest.convertTo(Idest,CV_32FC3,1/255.);
return Idest;
}
I would suggest:
1.Change division by a common factor to multiplication.
i.e. from a = a1/d; b = b1/d to d_1 = 1/d; a = a1*d_1; b = b1*d_1
2.Eliminate the four if test to a single bilinear interpolation.
I'm not sure whether that would help you. You could have a try.