Levenberg–Marquardt not converging - c++

I try to make a model fit using Levenberg-marquardt's method according to numerical recipes.
The Problem is: it does not converge or when it does, it's not precise... or at least the covariant matrix is strange.
int i=0;
for (i = 0; i < 3e4; i++) {
mrqmin(x, y, sig, NPCalib, a, ia, 3, covar, alpha, &chisk, afunc,
if (chisk < 1e-8)
if (sumchisk > 5)
if (alamda > 1e8)
alamda = 1e8;
(x,y) are 3 points (double) that work pretty well with the form y=a(x-x0)^2.
using sumchisk like this is the recommendation of numerical recipees for using this function.
alamda is capped at the top here as otherwise there might have been an overflow.
Other definitions and data-points:
double a[4] = {0.0, 0.0001, 100.0, -1};
int ia[4] = {0.0, 1, 1, 0};
double *x = {0.0, 799.157549545577, 799.92196995454, 800.683769692575};
double *y = {0.0, 524.26491, 525.26768, 526.26586};
double *sig = {0.0, 0.1*y[1], 0.1*y[2], 0.1*y[3]};
double **covar = new double*[4];
covar[1] = new double[4];
covar[2] = new double[4];
covar[3] = new double[4];
double **alpha = new double*[4];
alpha[1] = new double[4];
alpha[2] = new double[4];
alpha[3] = new double[4];
double chisk = 0;
double alamda = -1;
void afunc(int i, double x[], double a[], double *y, double dyda[], int ma)
*y = a[1] * pow(x[i] + a[2], 2) / pow(1 + a[3] * CT[i - 1], 2);
dyda[1] = pow(x[i] + a[2], 2) / pow(1 + a[3] * CT[i - 1], 2);
dyda[2] = (2 * a[1] * (x[i] + a[2])) / pow
(1 + a[3] * CalibTurn[i - 1], 2);
dyda[3] = (-2 * a[1] * CT[i - 1] * pow(x[i] + a[2], 2)) / pow
(1 + a[3] * CT[i - 1], 3);
I changed the nr-sourcecode to use double instead of float. The first array-element is not used because this comes from fortran-code and I didn't feel like changing such a small detail.
The model also contains a 3. parameter, which isn't used in this fit and thus remains a[3]=-1, because ia[3]=0. ia[]=1 means the parameter is about to get fitted...
However, Now I have the problem that sometimes this doesn't converge. It finishes with alamda=1e8 and i=3e4. Especially when I set the treshold for chisk lower.
The sets of parameters seem to be fine, though... the chisk is e.g. about 1e-6 and the parameters seem fine, but looking at the diagonals of the covariant-matrix (which should give the squared standard deviation of each parameter), there is some rubish like ~800000 for a parameter 0.0001.
Does anyone know what I did wrong when using this algorithm?
Anything specific I need to write into covar/alpha when I start? Can the sig be set like this?


Fsolve equivalent in C++

I am trying to replicate Matlab's Fsolve as my project is in C++ solving an implicit RK4 scheme. I am using the NLopt library using the NLOPT_LD_MMA algorithm. I have run the required section in matlab and it is considerably faster. I was wondering whether anyone had any ideas of a better Fsolve equivalent in C++? Another reason is that I would like f1 and f2 to both tend to zero and it seems suboptimal to calculate the L2 norm to include both of them as NLopt seems to only allow a scalar return value from the objective function. Does anyone have any ideas of an alternative library or perhaps using a different algorithm/constraints to more closely replicate the default fsolve.
Would it be better (faster) perhaps to call the python scipy.minimise.fsolve from C++?
double implicitRK4(double time, double V, double dt, double I, double O, double C, double R){
const int number_of_parameters = 2;
double lb[number_of_parameters];
double ub[number_of_parameters];
lb[0] = -999; // k1 lb
lb[1] = -999;// k2 lb
ub[0] = 999; // k1 ub
ub[1] = 999; // k2 ub
double k [number_of_parameters];
k[0] = 0.01;
k[1] = 0.01;
kOptData addData(time,V,dt,I,O,C,R);
opt = nlopt_create(NLOPT_LD_MMA, number_of_parameters);
nlopt_set_lower_bounds(opt, lb);
nlopt_set_upper_bounds(opt, ub);
nlopt_result nlopt_remove_inequality_constraints(nlopt_opt opt);
// nlopt_result nlopt_remove_equality_constraints(nlopt_opt opt);
double minf;
if (nlopt_optimize(opt, k, &minf) < 0) {
printf("nlopt failed!\n");
else {
printf("found minimum at f(%g,%g,%g) = %0.10g\n", k[0],k[1],minf);
return V + (1/2)*dt*k[0] + (1/2)*dt*k[1];```
double solveKs(unsigned n, const double *x, double *grad, void *my_func_data){
kOptData *unpackdata = (kOptData*) my_func_data;
double t1,y1,t2,y2;
double f1,f2;
t1 = unpackdata->time + ((1/2)-(1/6)*sqrt(3));
y1 = unpackdata->V + (1/4)*unpackdata->dt*x[0] + ((1/4)-(1/6)*sqrt(3))*unpackdata->dt*x[1];
t2 = unpackdata->time + ((1/2)+(1/6)*sqrt(3));
y2 = unpackdata->V + ((1/4)+(1/6)*sqrt(3))*unpackdata->dt*x[0] + (1/4)*unpackdata->dt*x[1];
f1 = x[0] - stateDeriv_implicit(t1,y1,unpackdata->dt,unpackdata->I,unpackdata->O,unpackdata->C,unpackdata->R);
f2 = x[1] - stateDeriv_implicit(t2,y2,unpackdata->dt,unpackdata->I,unpackdata->O,unpackdata->C,unpackdata->R);
return sqrt(pow(f1,2) + pow(f2,2));
My matlab version below seems to be a lot simpler but I would prefer the whole code in c++!
k1 = 0.01;
k2 = 0.01;
x0 = [k1,k2];
fun = #(x)solveKs(x,t,z,h,I,OCV1,Cap,Rct,static);
options = optimoptions('fsolve','Display','none');
k = fsolve(fun,x0,options);
% Calculate the next state vector from the previous one using RungeKutta
% update equation
znext = z + (1/2)*h*k(1) + (1/2)*h*k(2);``
function [F] = solveKs(x,t,z,h,I,O,C,R,static)
t1 = t + ((1/2)-(1/6)*sqrt(3));
y1 = z + (1/4)*h*x(1) + ((1/4)-(1/6)*sqrt(3))*h *x(2);
t2 = t + ((1/2)+(1/6)*sqrt(3));
y2 = z + ((1/4)+(1/6)*sqrt(3))*h*x(1) + (1/4)*h*x(2);
F(1) = x(1) - stateDeriv_implicit(t1,y1,h,I,O,C,R,static);
F(2) = x(2) - stateDeriv_implicit(t2,y2,h,I,O,C,R,static);

Iteration causes crash

What is wrong with this iteration?
This particular piece of code is causing my program to crash. When I disable the code it works but of course giving wrong results. It's supposed to compare sigma with sigma_last until they remain equal at e-14.
This is what I tried first:
long double sigma_last = NULL;
if(sigma_last != NULL){
sigma = sigma_last;
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
sigma_last = sigma + d_sigma;
while(set_precision_14(sigma)<= set_precision_14(sigma_last) || set_precision_14(sigma)>= set_precision_14(sigma_last));
Then I tried using a pointer (desperately):
long double *sigma_last;
*sigma_last = NULL;
if(*sigma_last != NULL){
sigma = *sigma_last;
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
*sigma_last = sigma + d_sigma;
while(set_precision_14(sigma)<= set_precision_14(*sigma_last) || set_precision_14(sigma)>= set_precision_14(*sigma_last));
Finding the source of error in entire code and trying to solve it took me hours, cannot really come up with another "maybe this?" . Feel free to smite me.
Here's a github link to my full code if anyone out there's interested.
Your first (and only) iteration, sigma_last will be null, resulting in crash:
*sigma_last = NULL; // <-- dereferencing uninitialized ptr here
if(*sigma_last != NULL) { // <-- dereferencing uninitialized ptr here too
and if that would have been fixed, here:
*sigma_last == sigma + d_sigma;
This is because you have not set sigma_last to point to some valid floating-point space in memory. There doesn't seem to be any point to using a pointer in this particular case, so if I were you, I'd drop it and use a normal long double instead, as in your first attempt.
In your first example you assign NULL, which is really the value zero, to sigma_last. If zero is not what you're intending, you could either go with a value that most certainly will be out of range (say 1e20 and then compare to say < 1e19) or keep a separate boolan for the job. I personally prefer the first option:
long double sigma_last = 1e20;
if(sigma_last < 1e19){
sigma = sigma_last;
A better way still would be to use an infinite, or finite, loop and then break out at a certain condition. This will make the code easier to read.
Finally, you seem to have a problem with your logic in the while, since the comparison sigma <= sigma_last || sigma >= sigma_last is always true. It's always smaller, bigger, or equal.
sigma_last does not need to be a pointer. You just need to somehow flag its value to know whether it was already set or not. From your code I am not sure if we can use zero for this purpose, but we can use some constant (long double minimum value), like this one:
#include <float.h>
const long double invalid_constant = LDBL_MIN;
Try this:
long double DESTINATION_CALCULATION_plusplus ( double phi, double lambda, double S, double azimuth,
double a, double b, double *phi2, double* lambda2, double* azimuth2){
phi = phi*M_PI/180;
lambda = lambda*M_PI/180;
double A1;
double eu2 = (pow(a, 2) - pow(b, 2)) / pow(b, 2); //second eccentricity
double c = pow(a,2) / b;
double v = sqrt(1 + (eu2 * pow(cos(phi) , 2)));
double beta1 = tan(phi) / v;
double Aeq = asin( cos(beta1) * sin(azimuth) );
double f = (a - b) / a; //flattening
double beta = atan((1-f)*tan(phi));
double u2 = pow(cos(Aeq),2)*eu2;
long double sigma1 = atan( tan(beta1)/ cos(azimuth) );
long double A = 1 + u2*(4096 + u2*(-768+u2*(320-175*u2))) / 16384;
long double B = u2*(256 + u2*(-128+u2*(74-47*u2)))/1024;
long double sigma = S / (b*A);
long double sigmaM = (2*sigma1 + sigma) /2;
long double d_w;
long double d_sigma;
double C;
double d_lambda;
long double sigma_last=invalid_constant;
if(sigma_last != invalid_constant){
sigma = sigma_last;
sigma1 = atan( tan(beta1) / cos(A1) );
sigmaM = (2*sigma1 + sigma) / 2;
d_sigma = B*sin(sigma)*(cos(2*sigmaM)+(1/4)*B*(cos(sigma)
sigma_last = sigma + d_sigma;
while(set_precision_14(sigma)<= set_precision_14(sigma_last) || set_precision_14(sigma)>= set_precision_14(sigma_last));
sigma = sigma_last;
*phi2 = atan((sin(beta1)*cos(sigma)+cos(beta1)*sin(sigma)*cos(azimuth))/((1-f)
d_w = (sin(sigma)*sin(azimuth))/(cos(beta1)*cos(sigma) - sin(beta1)* sin(sigma)*cos(azimuth));
C = (f/16)*pow(cos(Aeq),2)*(4+f*(4-3*pow(cos(Aeq),2)));
d_lambda = d_w - (1-C)*f*sin(azimuth)*(sigma + C*sin(sigma)*
*lambda2 = lambda + d_lambda;
*azimuth2 = sin(Aeq) / (-sin(beta1)*sin(sigma)+cos(beta1)*cos(sigma)*cos(azimuth));
*azimuth2 = *azimuth2 * 180/M_PI;
*lambda2 = *lambda2 * 180/M_PI;
*phi2 = *phi2 * 180/M_PI;

Memory leaks in a simple Rcpp function

I am developing a package in R that I would like to convert to Rcpp for better performance. I'm new to Rcpp (and C++ in general.) My problem is that the Rcpp function I've written works fine if I run it many times with one set of arguments, but if I try to loop it over many combinations of arguments, it springs memory leaks and causes the R session to abort.
Here is the code in R, which holds up well to any test I throw at it:
raw_noise <- function(timesteps, mu, sigma, phi) {
delta <- mu * (1 - phi)
variance <- sigma^2 * (1 - phi^2)
noise <- vector(mode = "double", length = timesteps)
noise[1] <- c(rnorm(1, mu, sigma))
for (i in (1:(timesteps - 1))) {
noise[i + 1] <- delta + phi * noise[i] + rnorm(1, 0, sqrt(variance))
Here is the code in Rcpp, using three Rcpp sugar functions (pow, sqrt, rnorm):
NumericVector raw_noise(int timesteps, double mu, double sigma, double phi) {
double delta = mu * (1 - phi);
double variance = pow(sigma, 2.0) * (1 - pow(phi, 2.0));
NumericVector noise(timesteps);
noise[0] = R::rnorm(mu, sigma);
for(int i = 0; i < timesteps; ++i) {
noise[i+1] = delta + phi*noise[i] + R::rnorm(0, sqrt(variance));
return noise;
What really confuses me is that this code runs without problems:
rerun(10000, raw_noise(timesteps = 30, mu = 0.5, sigma = 0.2, phi = 0.3))
But when I run this code:
test_loop <- function(timesteps, mu, sigma, phi, replicates) {
params <- cross_df(list(timesteps = timesteps, phi = phi, mu = mu, sigma =
for (i in 1:nrow(params)) {
pmap(params[i,], raw_noise)
test_loop(timesteps=c(5, 6, 7, 8, 9, 10), mu=c(0.2, 0.5), sigma=c(0.2, 0.5),
phi=c(0, 0.1))
More often than not, the R session aborts and RStudio crashes altogether. But sometimes I manage to catch this error message before the R session aborts:
Error in match(x, table, nomatch = 0L) : GC encountered a node
(0x10db7af50) with an unknown SEXP type: NEWSXP at memory.c:1692
As I understand it, NEWSXP is an exotic object type in R that doesn't come up very often. What's happening looks to me like a memory leak, but I'm not at all sure how to fix it. Like I said, I'm new to Rcpp and C++ generally so I'd appreciate any nudges in the right direction.
You have an out of bounds error:
for(int i = 0; i < timesteps; ++i)
to exceed the defined range since C++ indices start at 0 and not 1.
For example, 0 to timesteps - 1 has a length of timesteps and, thus, is okay.
0 to timesteps would have a length of timesteps + 1
This can be seen if you change noise[i+1] to noise(i+1), which performs a bounds check on the requested index.
Error in raw_noise(100, 2, 3, 0.2) :
Index out of bounds: [index=100; extent=100].
To address this, make the following change:
NumericVector raw_noise(int timesteps, double mu, double sigma, double phi) {
double delta = mu * (1 - phi);
double variance = pow(sigma, 2.0) * (1 - pow(phi, 2.0));
NumericVector noise(timesteps);
noise[0] = R::rnorm(mu, sigma);
// change here
for(int i = 0; i < timesteps - 1; ++i) { // 1 less time step
noise[i+1] = delta + phi*noise[i] + R::rnorm(0, sqrt(variance));
return noise;

How to speed up bilinear interpolation of image?

I'm trying to rotate image with interpolation, but it's too slow for real time for big images.
the code something like:
for(int y=0;y<dst_h;++y)
for(int x=0;x<dst_w;++x)
//do inverse transform
fPoint pt(Transform(Point(x, y)));
//in coor of src
int x1= (int)floor(pt.x);
int y1= (int)floor(pt.y);
int x2= x1+1;
int y2= y1+1;
Mask[y][x]= 1; //show pixel
float dx1= pt.x-x1;
float dx2= 1-dx1;
float dy1= pt.y-y1;
float dy2= 1-dy1;
pd[x].blue= (dy2*(ps[y1*src_w+x1].blue*dx2+ps[y1*src_w+x2].blue*dx1)+
pd[x].green= (dy2*(ps[y1*src_w+x1].green*dx2+ps[y1*src_w+x2].green*dx1)+
pd[x].red= (dy2*(ps[y1*src_w+x1].red*dx2+ps[y1*src_w+x2].red*dx1)+
//nearest neighbour
//pd[x]= ps[((int)pt.y)*src_w+(int)pt.x];
Mask[y][x]= 0; //transparent pixel
pd+= dst_w;
How I can speed up this code, I try to parallelize this code but it seems there is no speed up because of memory access pattern (?).
The key is to do most of your computations as ints. The only thing that is necessary to do as a float is the weighting. See here for a good resource.
From that same resource:
int px = (int)x; // floor of x
int py = (int)y; // floor of y
const int stride = img->width;
const Pixel* p0 = img->data + px + py * stride; // pointer to first pixel
// load the four neighboring pixels
const Pixel& p1 = p0[0 + 0 * stride];
const Pixel& p2 = p0[1 + 0 * stride];
const Pixel& p3 = p0[0 + 1 * stride];
const Pixel& p4 = p0[1 + 1 * stride];
// Calculate the weights for each pixel
float fx = x - px;
float fy = y - py;
float fx1 = 1.0f - fx;
float fy1 = 1.0f - fy;
int w1 = fx1 * fy1 * 256.0f;
int w2 = fx * fy1 * 256.0f;
int w3 = fx1 * fy * 256.0f;
int w4 = fx * fy * 256.0f;
// Calculate the weighted sum of pixels (for each color channel)
int outr = p1.r * w1 + p2.r * w2 + p3.r * w3 + p4.r * w4;
int outg = p1.g * w1 + p2.g * w2 + p3.g * w3 + p4.g * w4;
int outb = p1.b * w1 + p2.b * w2 + p3.b * w3 + p4.b * w4;
int outa = p1.a * w1 + p2.a * w2 + p3.a * w3 + p4.a * w4;
wow you are doing a lot inside most inner loop like:
1.float to int conversions
can do all on floats ...
they are these days pretty fast
the conversion is what is killing you
also you are mixing float and ints together (if i see it right) which is the same ...
any unnecessary call makes heap trashing and slow things down
instead add 2 variables xx,yy and interpolate them insde your for loops
3.if ....
why to heck are you adding if ?
limit the for ranges before loop and not inside ...
the background can be filled with other fors before or later

Calculating distances but the result is - 2147483648

Below is the code to calculate the distance
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
One issue is that the order of operations is messing you up (multiplication is done before subtraction)
(x[c1] - x[c2] * x[c1] - x[c2]) + (y[c1] - y[c2] * y[c1] - y[c2])
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2]))
I would also recommend, just for clarity, doing some of those calculations on separate lines (clearly that's a style choice that I prefer, and I'm sure some would disagree). It should make no difference to the compiler though
double deltaX = x[c1] - x[c2];
double deltaY = y[c1] - y[c2];
double distance = sqrt(deltaX * deltaX + deltaY * deltaY);
In my opinion that makes for more maintainable (and less error prone, as in this instance) code. Note that, as rewritten, the order of operations does not require extra parentheses.
Remember operator precedence: a - b * c - d means a - (b * c) - d.
Do you want
(x[c1] - (x[c2] * x[c1]) - x[c2])
((x[c1] - x[c2]) * (x[c1] - x[c2]))
(x[c1] - x[c2] * x[c1] - x[c2]) will be similar to (x[c1] - (x[c2] * x[c1]) - x[c2]) because * has higher precedence than -.
I am going to go ahead and fix a couple of issues:
// creating array of cities
double x[] = {21.0,12.0,15.0,3.0,7.0,30.0};
double y[] = {17.0,10.0,4.0,2.0,3.0,1.0};
// distance function - C = sqrt of A squared + B squared
double dist(int c1, int c2) {
double z = sqrt (
((x[c1] - x[c2]) * (x[c1] - x[c2])) + ((y[c1] - y[c2]) * (y[c1] - y[c2])));
return z;
void main()
int a[] = {1, 2, 3, 4, 5, 6};
execute(a, 0, sizeof(a)/sizeof(int));
int x;
printf("Type in a number \n");
scanf("%d", &x);
int y;
printf("Type in a number \n");
scanf("%d", &y);
double z = dist (x,y);
cout << "The result is " << z;
This fixes the unused return value, and also fixes the order of operation, and incorrect variable type of int.