R crashes when calling a Rcpp function in a loop - c++

So I have this Rcpp function in a .cpp file. You'll see that it is calling other custom functions that I don't show for simplicity, but those don't show any problem whatsoever.
// [[Rcpp::export]]
int sim_probability(float present_wealth , int time_left, int n, float mu, float sigma, float r, float gamma, float gu, float gl){
int i;
int count = 0;
float final_wealth;
NumericVector y(time_left);
NumericVector rw(time_left);
for(i=0;i<n;i++){
rw = random_walk(time_left, 0);
y = Y(rw, mu, sigma, r, gamma);
final_wealth = y[time_left-1] - y[0] + present_wealth;
if(final_wealth <= gu && final_wealth >= gl){
count = count + 1;
}
}
return count;
}
Then I can call this function from a .R seamlessly:
library(Rcpp)
sourceCpp("functions.cpp")
sim_probability(present_wealth = 100, time_left = 10, n = 1e3, mu = 0.05, sigma = 0.20, r = 0, gamma = 2, gu = 200, gl = 90)
But, if I call it inside a for loop, no matter how small it is, R crashes without popping any apparent error. The chunk below would make R crash.
for(l in 1:1){
sim_probability(present_wealth = 100, time_left = 10, n = 1e3, mu = 0.05, sigma = 0.20, r = 0, gamma = 2, gu = 200, gl = 90)
}
I've also tried to execute it manually (Ctrl + Enter) many times as fast as I could, and I'm fast enough it also crashes.
I have tried smaller or bigger loops, both out and within the function. It also crashes if it's called from another Rcpp function. I know I shouldn't call Rcpp functions in a R loop. Eventually I intend to call it from another Rcpp function (to generate a matrix of data) but it crashes all the same.
I have followed other cases that I've found googling and tried a few things, as changing to [] brackets for the arrays' index (this question), playing with the gc() garbage collector (as suggested here).
I suspected that something happened with the NumericVector definitions. But as far as I can tell they are declared properly.
It is been fairly pointed out in the comments that this is not a reproducible exaxmple. I'll add down here the missing functions Y() and random_walk():
// [[Rcpp::export]]
NumericVector Y(NumericVector path, float mu, float sigma, float r, float gamma){
int time_step, n, i;
time_step = 1;
float theta, y0, prev, inc_W;
theta = (mu - r) / sigma;
y0 = theta / (sigma*gamma);
n = path.size();
NumericVector output(n);
for(i=0;i<n;i++){
if(i == 0){
prev = y0;
inc_W = path[0];
}else{
prev = output[i-1];
inc_W = path[i] - path[i-1];
}
output[i] = prev + (theta / gamma) * (theta * time_step + inc_W);
}
return output;
}
// [[Rcpp::export]]
NumericVector random_walk(int length, float starting_point){
if(length == 1){return starting_point;}
NumericVector output(length);
output[1] = starting_point;
int i;
for(i=0; i<length; i++){output[i+1] = output[i] + R::rnorm(0,1);}
return output;
}
Edit1: Added more code so it is reproducible.
Edit2: I was assigning local variables when calling the functions. That was dumb from my part, but harmless. The same error still persists. But I've fixed that.
Edit3: As it's been pointed out by Dirk in the comments, I was doing a pointless exercise redefining the rnorm(). Now it's removed and fixed.

The answer has been solved in the comments, by #coatless. I put it here to keep it for future readers. The thing is that the random_walk() function wasn't properly set up correctly.
The problem was that the loop inside the function allowed i to go out of the defined dimension of the vector output. This is just inefficient when called once, yet it works. But it blows up when it's called many times real fast.
So in order to avoid this error and many others, the function should have been defined as
// [[Rcpp::export]]
NumericVector random_walk(int length, float starting_point){
if(length == 0){return starting_point;}
NumericVector output(length);
output[0] = starting_point;
int i;
for(i=0; i<length-1; i++){output[i+1] = output[i] + R::rnorm(0,1);}
return output;
}

Related

simulated annealing algorithm

I implemented simulated annealing in C++ to minimize (x-2)^2+(y-1)^2 in some range.
I'm getting varied output which is not acceptable for this type of heuristic method. It seems that the solution is converging but never quite closing in on the solution.
My code:
#include <bits/stdc++.h>
using namespace std;
double func(double x, double y)
{
return (pow(x-2, 2)+pow(y-1, 2));
}
double accept(double z, double minim, double T,double d)
{
double p = -(z - minim) / (d * T);
return pow(exp(1), p);
}
double fRand(double fMin, double fMax)
{
double f = (double)rand() / RAND_MAX;
return fMin + f * (fMax - fMin);
}
int main()
{
srand (time(NULL));
double x = fRand(-30,30);
double y = fRand(-30,30);
double xm = x, ym=y;
double tI = 100000;
double tF = 0.000001;
double a = 0.99;
double d=(1.6*(pow(10,-23)));
double T = tI;
double minim = func(x, y);
double z;
double counter=0;
while (T>tF) {
int i=1;
while(i<=30) {
x=x+fRand(-0.5,0.5);
y=y+fRand(-0.5,0.5);
z=func(x,y);
if (z<minim || (accept(z,minim,T,d)>(fRand(0,1)))) {
minim=z;
xm=x;
ym=y;
}
i=i+1;
}
counter=counter+1;
T=T*a;
}
cout<<"min: "<<minim<<" x: "<<xm<<" y: "<<ym<<endl;
return 0;
}
How can I get it to reach the solution?
There are a couple of things that I think are wrong in your implementation of the simulated annealing algorithm.
At every iteration you should look at some neighbours z of current minimum and update it if f(z) < minimum. If f(z) > minimum you can also accept the new point, but with an acceptance probability function.
The problem is that in your accept function, the parameter d is way too low - it will always return 0.0 and never trigger the condition of acceptance. Try something like 1e-5; it doesn't have to be physically correct, it only has to decrease while lowering the "temperature".
After updating the temperature in the outer loop, you should put x=xm and y=ym, before doing the inner loop or instead of searching the neigbours of the current solution you will basically randomly wander around (you aren't checking any boundaries too).
Doing so, I usually get some output like this:
min: 8.25518e-05 x: 2.0082 y: 0.996092
Hope it helped.

Generating a random number from a lognormal distribution

This shouldn't be too difficult, but for some reason this sampling random numbers from a distribution is really tripping me up.
I know the best options for generating random numbers from a distribution are boost/C++11 libraries...unfortunately, I can't get this code to compile with c++0x, and anyway, I would preferably keep compatibility on a server that I'm also using, which is running gcc 4.1.2 - ancient, I know, doesn't support newer C++. Frustrations. And as always, time crunch means I need to do the best I can with a quick fix.
Taking the exponent of a random number from the box muller equations is my next option, but I'm not getting a lognormal distribution with the parameters I specify. I don't understand why this is not working.
Any help would be hugely appreciated!
void testRNG(){
int mean = 5000;
int std = 50;
ofstream out("./Output/normal_samples.out");
RunningStats normal;
for (int i=0;i<2000;++i){
double sample = randomSample(mean, std, NORMAL);//call function with box muller transformation to return a number from a normal distriubtion
out<<sample<<endl;
normal.Push(sample);//keep a running average of sampled numbers
}
cout<<"Normal Mean = "<<normal.Mean()<<endl;
cout<<"Normal Std = "<<normal.StandardDeviation()<<endl;
RunningStats lognormal;
for (int i=0;i<2000;++i){
double sample = randomSample(mean, std, LOGNORMAL);
out<<sample<<endl;
lognormal.Push(sample);
}
cout<<"Lognormal Mean = "<<lognormal.Mean()<<endl;
cout<<"Lognormal Std = "<<lognormal.StandardDeviation()<<endl;
}
The sampling functions, which I didn't write, go first to a case thing from randomSample(), and then call:
EDIT - I noticed that it actually did call a function to find the lognormal parameters. Added in.
double randNormal(double mean, double stdev) {
static long numSamples = 0;
static double Z2;
if ((numSamples++ & 1) == 0) {
double Z1, U1, U2;
do { U1 = randUniform(0, 1); } while (U1 <= 0 || U1 >= 1);
do { U2 = randUniform(0, 1); } while (U1 <= 0 || U1 >= 1);
Z1 = sqrt(-2 * log(U1)) * cos(6.28318531 * U2);
Z2 = sqrt(-2 * log(U1)) * sin(6.28318531 * U2);
return mean + stdev * Z1;
} else {
return mean + stdev * Z2;
}
}
double randLognormal(double mu, double sigma) {
return exp(randNormal(mu, sigma));
}
double randLognormalMeanStdev(double mean, double stdev) {
return randLognormal( log(mean) - 0.5 * log(1 + (stdev * stdev) / (mean * mean)) , log(1 + (stdev * stdev) / (mean * mean)));
}
So the output I get is:
Normal Mean = 4998.72 //I said 5000
Normal Std = 49.7054 //I said 50
Lognormal Mean = 4999.74
Lognormal Std = 0.492766 //this is the part that is not working
What am I missing to get the lognormal std to be what I want?
Other options would also be appreciated - maybe there is something else I am missing.
Thanks in advance!
Edit - I realized I should have made it clear that I need to sample from a lognormal distribution

C/C++ declared and defined variable turns invisible

The entirety of my code is a bit too much to post on to here so I'll try to show the essentials.
I am coding a simple graphically represented analogue clock (12-hour with three hands).
Currently my code works if I let the clock run from default i.e. all hands start at 12.
However I have added a feature that allows editing of the time shown and inherent to this, regardless of starting position of the hand, when it hits 12, the larger respective hand should then tick once. My code is below.
for (psi = 0; psi<6.28318530718-0.5236; psi+=0.5235987756) {
float xply = sin(psi);
float yply = cos(psi);
int hhx = x0 + (circleRad-100)*xply;
int hhy = y0 - (circleRad-100)*yply;
float phi;
for (phi = 0; phi<6.28318530718-0.10472; phi+=0.1047197551) {
float Multx = sin(phi);
float Multy = cos(phi);
int mhx = x0 + (circleRad-50)*Multx;
int mhy = y0 - (circleRad-50)*Multy;
float theta;
for (theta= 0; theta<6.28318530718-0.104720; theta+=0.1047197551) {
// If seconds are given then the if condition is tested
if (secPhase > 0) {
float angle = theta+secPhase;
// If second hand reach top, for loop breaks and enters a new loop for next minute, secphase is erased as new minute start from 0 secs.
if (angle > 6.28318530718-0.104720) {
plotHands(angle, x0, y0, circleRad, a, mhx, mhy, hhx, hhy, bytes);
capture.replaceOverlay(true, (const unsigned char*)a);
sleep(1);
secPhase = 0;
break;
}
// if second hand has not reached top yet, then plotting continues
plotHands(angle, x0, y0, circleRad, a, mhx, mhy, hhx, hhy, bytes);
capture.replaceOverlay(true, (const unsigned char*)a);
sleep(1);
}
// if there were no seconds given, plotting begins at 12.
else {
plotHands(theta, x0, y0, circleRad, a, mhx, mhy, hhx, hhy, bytes);
capture.replaceOverlay(true, (const unsigned char*)a);
sleep(1);
}
}
}
}
Currently my code works for seconds. There are declared and defined values, that I have not included here, that I can alter that will change the starting position of each hand and wherever the second hand is, when it hits 12 the minute hand will tick once.
This is the problem. Logically, I could just apply the same concept that I used for the second hand but migrate it to the minute hand and change the respective variable names involved so that when the minute hand does strike 12, the hour hand will move. This is the code that breaks:
for (phi = 0; phi<6.28318530718-0.10472; phi+=0.1047197551) {
if (minPhase > 0) {
float minAngle = phi + minPhase;
if (minAngle > 6.28318530718-0.10472) {
minPhase = 0;
break;
}
float Multx = sin(minAngle);
float Multy = cos(minAngle);
int mhx = x0 + (circleRad-50)*Multx;
int mhy = y0 - (circleRad-50)*Multy;
}
else {
float Multx = sin(phi);
float Multy = cos(phi);
int mhx = x0 + (circleRad-50)*Multx;
int mhy = y0 - (circleRad-50)*Multy;
}
}
I have taken only the middle for loop involving the minute hand. These loops and statements ensure that if there is no given starting point of the minute hand, the else statement will run, but if there is a starting point, the starting point will tick until it strikes twelve and which point it breaks to the hour for loop, ticks once, whilst clearing the starting point of the minute hand to start afresh in the new hour.
However once I attempt to compile the code, the compiler tells me:
error: 'mhx' was not declared in this scope
error: 'mhy' was not declared in this scope
it shows this everytime this variable is called in the function to draw the minute hands and is as if these variables have simply disappeared. They have clearly been declared and defined in my code by when attempted to be called in the for loop below it, it claims that these variables are missing.
I found also that if I removed the 'else' statement, the code compiled and run, but was broken, i.e. the minute hand was not in its supposed position.
Can anyone enlighten me please? I am still very new to C and C++.
Thank you in advance.
The variables go out of scope when they hit the closing brace of either the if or the else. Declare them outside of the scope and assign their values inside the if/else blocks.
for (phi = 0; phi<6.28318530718-0.10472; phi+=0.1047197551) {
if (minPhase > 0) {
float minAngle = phi + minPhase;
if (minAngle > 6.28318530718-0.10472) {
minPhase = 0;
break;
}
float Multx = sin(minAngle);
float Multy = cos(minAngle);
int mhx = x0 + (circleRad-50)*Multx;
int mhy = y0 - (circleRad-50)*Multy;
// Multx, Multy, mhx, mhy will go out of scope when the following brace is reached
}
else {
float Multx = sin(phi);
float Multy = cos(phi);
int mhx = x0 + (circleRad-50)*Multx;
int mhy = y0 - (circleRad-50)*Multy;
// Multx, Multy, mhx, mhy will go out of scope when the following brace is reached
}
}
You should instead do this:
for (phi = 0; phi<6.28318530718-0.10472; phi+=0.1047197551) {
float Multyx, Multy;
int mhx, mhy;
// These variables will now be visible in the entire for loop's scope not just the if or else statement they were declared into.
if (minPhase > 0) {
float minAngle = phi + minPhase;
if (minAngle > 6.28318530718-0.10472) {
minPhase = 0;
break;
}
Multx = sin(minAngle);
Multy = cos(minAngle);
mhx = x0 + (circleRad-50)*Multx;
mhy = y0 - (circleRad-50)*Multy;
}
else {
Multx = sin(phi);
Multy = cos(phi);
mhx = x0 + (circleRad-50)*Multx;
mhy = y0 - (circleRad-50)*Multy;
}
}
You need to move mhx and mhy to the scope above the if statement to be visible outside the if/else.
for (phi = 0; phi<6.28318530718-0.10472; phi+=0.1047197551) {
int mhx, mhy; // move declaration here
if (minPhase > 0) {
float minAngle = phi + minPhase;
if (minAngle > 6.28318530718-0.10472) {
minPhase = 0;
break;
}
float Multx = sin(minAngle);
float Multy = cos(minAngle);
mhx = x0 + (circleRad-50)*Multx; // no longer a declaration, just assignment
mhy = y0 - (circleRad-50)*Multy;
}
else {
float Multx = sin(phi);
float Multy = cos(phi);
mhx = x0 + (circleRad-50)*Multx; // no longer a declaration, just assignment
mhy = y0 - (circleRad-50)*Multy;
}
}
I assume you have other code in the body of your for loop after this if statement that you haven't shown.

Optimization method for finding floating status of an object

The problem to solve is finding the floating status of a floating body, given its weight and the center of gravity.
The function i use calculates the displaced volume and center of bouyance of the body given sinkage, heel and trim.
Where sinkage is a length unit and heel/trim is an angle limited to a value from -90 to 90.
The floating status is found when displaced volum is equal to weight and the center of gravity is in a vertical line with center of bouancy.
I have this implemeted as a non-linear Newton-Raphson root finding problem with 3 variables (sinkage, trim, heel) and 3 equations.
This method works, but needs good initial guesses. So I am hoping to find either a better approach for this, or a good method to find the initial values.
Below is the code for the newton and jacobian algorithm used for the Newton-Raphson iteration. The function volume takes the parameters sinkage, heel and trim. And returns volume, and the coordinates for center of bouyancy.
I also included the maxabs and GSolve2 algorithms, I belive these are taken from Numerical Recipies.
void jacobian(float x[], float weight, float vcg, float tcg, float lcg, float jac[][3], float f0[]) {
float h = 0.0001f;
float temp;
float j_volume, j_vcb, j_lcb, j_tcb;
float f1[3];
volume(x[0], x[1], x[2], j_volume, j_lcb, j_vcb, j_tcb);
f0[0] = j_volume-weight;
f0[1] = j_tcb-tcg;
f0[2] = j_lcb-lcg;
for (int i=0;i<3;i++) {
temp = x[i];
x[i] = temp + h;
volume(x[0], x[1], x[2], j_volume, j_lcb, j_vcb, j_tcb);
f1[0] = j_volume-weight;
f1[1] = j_tcb-tcg;
f1[2] = j_lcb-lcg;
x[i] = temp;
jac[0][i] = (f1[0]-f0[0])/h;
jac[1][i] = (f1[1]-f0[1])/h;
jac[2][i] = (f1[2]-f0[2])/h;
}
}
void newton(float weight, float vcg, float tcg, float lcg, float &sinkage, float &heel, float &trim) {
float x[3] = {10,1,1};
float accuracy = 0.000001f;
int ntryes = 30;
int i = 0;
float jac[3][3];
float max;
float f0[3];
float gauss_f0[3];
while (i < ntryes) {
jacobian(x, weight, vcg, tcg, lcg, jac, f0);
if (sqrt((f0[0]*f0[0]+f0[1]*f0[1]+f0[2]*f0[2])/2) < accuracy) {
break;
}
gauss_f0[0] = -f0[0];
gauss_f0[1] = -f0[1];
gauss_f0[2] = -f0[2];
GSolve2(jac, 3, gauss_f0);
x[0] = x[0]+gauss_f0[0];
x[1] = x[1]+gauss_f0[1];
x[2] = x[2]+gauss_f0[2];
// absmax(x) - Return absolute max value from an array
max = absmax(x);
if (max < 1) max = 1;
if (sqrt((gauss_f0[0]*gauss_f0[0]+gauss_f0[1]*gauss_f0[1]+gauss_f0[2]*gauss_f0[2])) < accuracy*max) {
x[0]=x2[0];
x[1]=x2[1];
x[2]=x2[2];
break;
}
i++;
}
sinkage = x[0];
heel = x[1];
trim = x[2];
}
int GSolve2(float a[][3],int n,float b[]) {
float x,sum,max,temp;
int i,j,k,p,m,pos;
int nn = n-1;
for (k=0;k<=n-1;k++)
{
/* pivot*/
max=fabs(a[k][k]);
pos=k;
for (p=k;p<n;p++){
if (max < fabs(a[p][k])){
max=fabs(a[p][k]);
pos=p;
}
}
if (ABS(a[k][pos]) < EPS) {
writeLog("Matrix is singular");
break;
}
if (pos != k) {
for(m=k;m<n;m++){
temp=a[pos][m];
a[pos][m]=a[k][m];
a[k][m]=temp;
}
}
/* convert to upper triangular form */
if ( fabs(a[k][k])>=1.e-6)
{
for (i=k+1;i<n;i++)
{
x = a[i][k]/a[k][k];
for (j=k+1;j<n;j++) a[i][j] = a[i][j] -a[k][j]*x;
b[i] = b[i] - b[k]*x;
}
}
else
{
writeLog("zero pivot found in line:%d",k);
return 0;
}
}
/* back substitution */
b[nn] = b[nn] / a[nn][nn];
for (i=n-2;i>=0;i--)
{
sum = b[i];
for (j=i+1;j<n;j++)
sum = sum - a[i][j]*b[j];
b[i] = sum/a[i][i];
}
return 0;
}
float absmax(float x[]) {
int i = 1;
int n = sizeof(x);
float max = x[0];
while (i < n) {
if (max < x[i]) {
max = x[i];
}
i++;
}
return max;
}
Have you considered some stochastic search methods to find the initial value and then fine-tuning with Newton Raphson? One possibility is evolutionary computation, you can use the Inspyred package. For a physical problem similar in many ways to the one you describe, look at this example: http://inspyred.github.com/tutorial.html#lunar-explorer
What about using a damped version of Newton's method? You could quite easily modify your implementation to make it. Think about Newton's method as finding a direction
d_k = f(x_k) / f'(x_k)
and updating the variable
x_k+1 = x_k - L_k d_k
In the usual Newton's method, L_k is always 1, but this might create overshoots or undershoots. So, let your method chose L_k. Suppose that your method usually overshoots. A possible strategy consists in taking the largest L_k in the set {1,1/2,1/4,1/8,... L_min} such that the condition
|f(x_k+1)| <= (1-L_k/2) |f(x_k)|
is satisfied (or L_min if none of the values satisfies this criteria).
With the same criteria, another possible strategy is to start with L_0=1 and if the criteria is not met, try with L_0/2 until it works (or until L_0 = L_min). Then for L_1, start with min(1, 2L_0) and do the same. Then start with L_2=min(1, 2L_1) and so on.
By the way: are you sure that your problem has a unique solution? I guess that the answer to this question depends on the shape of your object. If you have a rugby ball, there's one angle that you cannot fix. So if your shape is close to such an object, I would not be surprised that the problem is difficult to solve for that angle.

Tensor Product Algorithm Optimization

double data[12] = {1, z, z^2, z^3, 1, y, y^2, y^3, 1, x, x^2, x^3};
double result[64] = {1, z, z^2, z^3, y, zy, (z^2)y, (z^3)y, y^2, z(y^2), (z^2)(y^2), (z^3)(y^2), y^3, z(y^3), (z^2)(y^3), (z^3)(y^3), x, zx, (z^2)x, (z^3)x, yx, zyx, (z^2)yx, (z^3)yx, (y^2)x, z(y^2)x, (z^2)(y^2)x, (z^3)(y^2)x, (y^3)x, z(y^3)x, (z^2)(y^3)x, (z^3)(y^3)x, x^2, z(x^2), (z^2)(x^2), (z^3)(x^2), y(x^2), zy(x^2), (z^2)y(x^2), (z^3)y(x^2), (y^2)(x^2), z(y^2)(x^2), (z^2)(y^2)(x^2), (z^3)(y^2)(x^2), (y^3)(x^2), z(y^3)(x^2), (z^2)(y^3)(x^2), (z^3)(y^3)(x^2), x^3, z(x^3), (z^2)(x^3), (z^3)(x^3), y(x^3), zy(x^3), (z^2)y(x^3), (z^3)y(x^3), (y^2)(x^3), z(y^2)(x^3), (z^2)(y^2)(x^3), (z^3)(y^2)(x^3), (y^3)(x^3), z(y^3)(x^3), (z^2)(y^3)(x^3), (z^3)(y^3)(x^3)};
What is the fastest (fewest executions) to produce result given data? Assume, that data is variable in size, but always a factor of 4 (e.g., 4, 8, 12, etc.).
No Boost. I am trying to keep my dependencies small. STL Algorithms are ok.
HINT: result array size should always be 4^(multiple size) (e.g., 4, 16, 64, etc.).
BONUS: If you can compute result just given x, y, z
Additional examples:
double data[4] = {1, z, z^2, z^3};
double result[4] = {1, z, z^2, z^3};
double data[8] = {1, z, z^2, z^3, 1, y, y^2, y^3};
double result[16] = { ... };
I chose the accepted answer code after running this benchmark: https://gist.github.com/1232406. Basically, the top two codes were run and the one with the smallest execution time won.
void Tensor(std::vector<double>& result, double x, double y, double z) {
result.resize(64); //almost noop if already right size
double tz = z*z;
double ty = y*y;
double tx = x*x;
std::array<double, 12> data = {0, 0, tz, tz*z, 1, y, ty, ty*y, 1, x, tx, tx*x};
register std::vector<double>::iterator iter = result.begin();
register int yi;
register double xy;
for(register int xi=0; xi<4; ++xi) {
for(yi=0; yi<4; ++yi) {
xy = data[4+yi]*data[8+xi];
*iter = xy; //a smart compiler can do these four in parallell
*(++iter) = z*xy;
*(++iter) = data[2]*xy;
*(++iter) = data[3]*xy;
++iter; //workaround for speed!
}
}
}
There's probably at least one bug in here somewhere, but it should be fast, with no dependancies (outside of std::vector/std::array), just takes x,y,z. I avoided recursion though, so it only works for 3 in/64 out. The concept can be applied to any number of parameters though. You just have to instantiate yourself.
A good compiler will autovectorize this I guess none of my compilers are good:
void tensor(const double *restrict data,
int dimensions,
double *restrict result) {
result[0] = 1.0;
for (int i = 0; i < dimensions; i++) {
for (int j = (1 << (i * 2)) - 1; j > -1; j--) {
double alpha = result[j];
{
double *restrict dst = &result[j * 4];
const double *restrict src = &data[(dimensions - 1 - i) * 4];
for (int k = 0; k < 4; k++) dst[k] = alpha * src[k];
}
}
}
}
you should use dynamic algorithm. that is, you can use previous results. for example, you keep y^2 result and use it when computing (y^2)z instead of computing it again.
#include <vector>
#include <cstddef>
#include <cmath>
void Tensor(std::vector<double>& result, const std::vector<double>& variables, size_t index)
{
double p1 = variables[index];
double p2 = p1*p1;
double p3 = p1*p2;
if (index == variables.size() - 1) {
result.push_back(1);
result.push_back(p1);
result.push_back(p2);
result.push_back(p3);
} else {
Tensor(result, variables, index+1);
ptrdiff_t size = result.size();
for(int j=0; j<size; ++j)
result.push_back(result[j]*p1);
for(int j=0; j<size; ++j)
result.push_back(result[j]*p2);
for(int j=0; j<size; ++j)
result.push_back(result[j]*p3);
}
}
std::vector<double> Tensor(const std::vector<double>& params) {
std::vector<double> result;
double rsize = (1<<(2*params.size());
result.reserve(rsize);
Tensor(result, params);
return result;
}
int main() {
std::vector<double> params;
params.push_back(3.1415926535);
params.push_back(2.7182818284);
params.push_back(42);
params.push_back(65536);
std::vector<double> result = Tensor(params);
}
I verified that this one compiles and runs (http://ideone.com/IU1eQ). It runs fast, with no dependancies (outside of std::vector). It also takes any number of parameters. Since calling the recursive form is awkward, I made a wrapper. It makes one function call for each parameter, and one call to dynamic memory (in the wrapper).
You should look for Pascal's pyramid to get fast solution. Useful link 1, useful link 2, useful link 3 and useful link 4.
One more thing: as I see it would be a base of a finite element solver. Usually to write own BLAS solver is not a good idea. Do not reinvent the wheel! I think you should use a BLAS solver like intel MKL or Cuda base BLAS.