Margin of optimisation in C++ armadillo code - c++

I am trying to migrate a quite complex matlab funciton in c++ using the library armadillo but I have serious problems in terms of performance (my c++ version is much slower than the matlab one which is a bit odd). I was wondering whether any of you can spot a point where I can improve my code and possibly can give me some suggestions. The problem is created by a bit in which I try to minimise a function. This is the matlab code
lambda = fminbnd(#SSE, lower_limit,upper_limit,options, y0, x);
function L=SSE(lambda, alpha, y)
N=size(y,2);
len=size(y,1);
for i=1:N
z(:,i)=jglog(y(:,i),alpha,lambda);
end
s = 0;
mean_spec=mean(z,2);
for i=1:N
for j=1:len
s = s + (z(j,i)-mean_spec(j,1))^2;
end
end
L=s;
end
function z=glog(y,alpha,lambda) % Glog transform
z=log((y-alpha)+sqrt((y-alpha).^2+lambda));
end
function [zj, gmn]=jglog(y,y0,lambda)
z=glog(y,y0,lambda);
gmn=exp(mean(log(sqrt((y-y0).^2+lambda))));
zj=z.*gmn;
end
Following this post I downloaded the c++ version of the matlab minimisation code (brent's method). You can find it here.
This is my c++ version.
class SSE_c : public brent::func_base //create functor
{
public:
mat A;
double offset;
virtual double operator() (double lam)
{
return SSE(A,offset,lam);
}
SSE_c(mat a,double of) {A=a;offset=of;}
};
SSE_c fun(x,y0);
brent::glomin(low_limit,up_limit,c,100,step_threshold,step_threshold,fun,lambda);
double SSE(mat& m,double ofs,double lam)
{
mat z(m.n_rows,m.n_cols);
for(uint i=0;i<m.n_cols;i++)
z.col(i) = jglog(m.col(i),ofs,lam);
std::cout << "Iteration:" << count++;
double s=0;
vec mean_spec(z.n_rows);
FuncOnMatRows(z,mean_spec,[](rowvec const& w){return mean(w);});
for(uint i=0;i<z.n_cols;i++)
for(uint j=0;j<z.n_rows;j++)
if(is_finite(z(j,i)))
s += pow((z(j,i)-mean_spec(j)),2);
return s;
}
vec jglog(vec&& v,double ofs,double lam)
{
vec g=glogF(v,ofs,lam);
double gmn;
vec interm = log(sqrt(square(v-ofs)+lam));
if(interm.is_finite())
gmn=exp(mean(interm));
else
gmn=exp(mean(interm.elem(find_finite(interm))));
g = g*gmn;
return g;
}
vec glogF(vec&& v,double ofs,double lam)
{
return glogF(v,ofs,lam);
}
vec glogF(vec& v,double ofs,double lam)
{
vec z = log((v-ofs)+sqrt(square(v-ofs)+lam));
return z;
}
template<typename Func>
void FuncOnMatRows(const mat& M,vec& v,Func const & func)
{
for(uint i=0;i<M.n_rows;i++) // operation calculated on rows
{
if(M.row(i).is_finite())
v(i) = func(M.row(i));
else if(!any(M.row(i)>0))
v(i) = NAN;
else
{
rowvec b=M.row(i);
v(i) = func(b.elem(find_finite(b)).t()); //output of .elem is always colvec, so transpose
}
}
}

Related

Idiom for data aggregation and post processing in C++

A common task in programming is to process data on the fly and, when all data are collected, do some post processing. A simple example for this would be the computation of the average (and other statistics), where you can have a class like this
class Statistic {
public:
Statistic() : nr(0), sum(0.0), avg(0.0) {}
void add(double x) { sum += x; ++nr; }
void process() { avg = sum / nr; }
private:
int nr;
double sum;
double avg;
};
A disadvantage with this approach is, that we always have to remember to call the process() function after adding all the data. Since in C++ we have things like RAII, this seems like a less than ideal solution.
In Ruby, for example, we can write code like this
class Avg
attr_reader :avg
def initialize
#nr = 0
#sum = 0.0
#avg = nil
if block_given?
yield self
process
end
end
def add(x)
#nr += 1
#sum += x.to_f
end
def process
#avg = #sum / #nr
end
end
which we then can call like this
avg = Avg.new do |a|
data.each {|x| a.add(x)}
end
and the process method is automatically called when exiting the block.
Is there an idiom in C++ that can provide something similar?
For clarification: this question is not about computing the average. It is about the following pattern: feeding data to an object and then, when all the data is fed, triggering a processing step. I am interested in context-based ways to automatically trigger the processing step - or reasons why this would not be a good idea in C++.
"Idiomatic average"
I don't know Ruby but you can't translate idioms directly anyhow. I know that calculating the average is just an example, so lets see what we can get from that example...
Idiomatic way to caclulate sum, and average of elements in a container is std::accumulate:
std::vector<double> data;
// ... fill data ...
auto sum = std::accumulate( a.begin(), a.end() , 0.0);
auto avg = sum / a.size();
The building blocks are container, iterator and algorithms.
If you do not have elements to be processed readily available in a container you can still use the same algorithms, because algorithms only care about iterators. Writing your own iterators requires a bit of boilerplate. The following is just a toy example that calcualtes average of results of calling the same function a certain number of times:
#include <numeric>
template <typename F>
struct my_iter {
F f;
size_t count;
my_iter(size_t count, F f) : count(count),f(f) {}
my_iter& operator++() {
--count;
return *this;
}
auto operator*() { return f(); }
bool operator==(const my_iter& other) const { return count == other.count;}
};
int main()
{
auto f = [](){return 1.;};
auto begin = my_iter{5,f};
auto end = my_iter{0,f};
auto sum = std::accumulate( begin, end, 0.0);
auto avg = sum / 5;
std::cout << sum << " " << avg;
}
Output is:
5 1
Suppose you have a vector of paramters for a function to be called, then calling std::accumulate is straight-forward:
#include <iostream>
#include <vector>
#include <numeric>
int main()
{
auto f = [](int x){return x;};
std::vector<int> v = {1,2,5,10};
auto sum = std::accumulate( v.begin(), v.end(), 0.0, [f](int accu,int add) {
return accu + f(add);
});
auto avg = sum / 5;
std::cout << sum << " " << avg;
}
The last argument to std::accumulate specifies how the elements are added up. Instead of adding them up directly I add up the result of calling the function. Output is:
18 3.6
For your actual question
Taking your question more literally and to answer also the RAII part, here is one way you can make use of RAII with your statistic class:
struct StatisticCollector {
private:
Statistic& s;
public:
StatisticCollector(Statistic& s) : s(s) {}
~StatisticCollector() { s.process(); }
};
int main()
{
Statistic stat;
{
StatisticCollector sc{stat};
//for (...)
// stat.add( x );
} // <- destructor is called here
}
PS: Last but not least there is the alternative to just keep it simple. Your class definition is kinda broken, because all results are private. Once you fix that, it is kinda obvious that you need no RAII to make sure process gets called:
class Statistic {
public:
Statistic() : nr(0), sum(0.0), avg(0.0) {}
void add(double x) { sum += x; ++nr; }
double process() { return sum / nr; }
private:
int nr;
double sum;
};
This is the right interface in my opinion. The user cannot forget to call process because to get the result they need to call it. If the only purpose of the class is to accumulate numbers and process the result it should not encapsulate the result. The result is for the user of the class to store.

C++ Random number generator: Trying to get the results as vectors but getting an error by making void function to do it

Previously I asked how to make a random number generator in C++ here, and with other people's help I got it right.
Now, I'm trying to return the result as vectors instead of a series of numbers, but it doesn't seem working right. I know that void is not meant to return anything, but using double instead of void didn't work either..
To give more details, I'm trying to return two containers named x_coord and y_coord that contain all the result from x_coord.push_back(oldRoot) and y_coord.push_back(newRoot). I'm doing this because I need them for later use. What is the best way to do this? Thank you for help in advance.
#include <iostream>
#include <vector>
using namespace std;
// Generate random x and y coordinates for 128 particles
class Random {
public:
double oldRoot;
double newRoot;
int iteNum;
Random(double aOldRoot, double aNewRoot, int aIteNum) {
oldRoot = aOldRoot;
newRoot = aNewRoot;
iteNum = aIteNum;
}
void generate() {
vector<double> x_coord;
vector<double> y_coord;
int count = 0;
while (count <= iteNum) {
double totalRoot = oldRoot + newRoot;
if (totalRoot > 1.0) {
oldRoot = newRoot;
newRoot = totalRoot - 1.0;
x_coord.push_back(oldRoot);
y_coord.push_back(newRoot);
}
else {
oldRoot = newRoot;
newRoot = totalRoot;
x_coord.push_back(oldRoot);
y_coord.push_back(newRoot);
}
count += 1;
}
return x_coord, y_coord;
}
};
int main() {
Random random10(0.1412, 0.2343, 16);
random10.generate();
return 0;
}
A simple option would be to return the 2 vectors as a std::pair, like this:
std::pair<std::vector<double>, std::vector<double>> generate() {
// fill up the vectors
return {x_coord, y_coord};
}
However, I would suggest storing the x and y co-ordinates together in a data structure, like this:
std::vector<std::pair<double, double>> xy_coord;
since the x and y co-ordinates should probably be pairwise connected.
You can insert pairs of randomly generated numbers like this:
xy_coord.push_back({oldRoot, newRoot});
The error appeared because you defined the function generate() has returning type is void while you are return x_coord, y_coord;
To fix it, you must have returning type like std::pair<vector<double>, vector<double>> because your x_coord and y_coord are vector<double> and you are returning them at the same time as a tuple.
Basically, you could do something like:
std::pair<vector<double>, vector<double>> generate() {
vector<double> x_coord;
vector<double> y_coord;
//Put your code here...
return {x_coord, y_coord};
}

Write a function that may return either one or more values

Suppose I want to write a function that, say, returns the sum of f(x) for x in a certain range.
double func() {
double sum = 0.;
for (int i=0; i<100; i++) {
sum += f(i);
}
return sum;
}
But sometimes, in addition to the final sum I also need the partial terms, so I can do
pair<vector<double>,double> func_terms() {
double sum = 0.;
vector<double> terms(100);
for (int i=0; i<100; i++) {
terms[i] = f(i);
sum += terms[i];
}
return {terms, sum};
}
The thing is, this is code duplication. This seems very harmless in this example, but suppose the function is much larger (which it is in the situation that prompted me to ask this), and the two versions differ in just a handful of lines lines (in this example the logic is the same only the latter version stores the term in a vector before adding to sum, and returns a pair with that vector; any other logic is equivalent). Then I will have to write and maintain two nearly-identical versions of the same function, differing only in a couple lines and in the return statement. My question is if there is an idiom/pattern/best practice to deal with this kind of problem. Something that would let me share the common code between the two versions.
In short: I can write two functions and have to maintain two nearly-identical versions. Or I can just use the latter but that will be very wasteful whenever I just need the sum, which is unacceptable. What's the best pattern to deal with this?
I reckon that with C++17 one can do something like
template<bool partials>
double func(vector<double>* terms=nullptr) {
double sum = 0.;
if constexpr (partials)
*terms = vector<double>(100);
for (int i=0; i<100; i++) {
if constexpr (partials) {
(*terms)[i] = f(i);
sum += (*terms)[i];
} else {
sum += f(i);
}
}
return sum;
}
Which comes very close to what I intended, apart from using pointers (I can't use references because terms may be empty).
Your question title says "Write a function that may return either one or more values", but it's more than that; as your example shows, the function may also do a lot of different things long before a result is returned. There really is no general solution to such a broad problem.
However, for the specific case you've explained I'd like to offer a low-tech solution. You could simply implement both functions in terms of a third function and give that third function a parameter to determine whether the extra functionality is performed or not.
Here is a C++17 example, in which that third function is called func_impl and more or less hidden inside a namespace to make life easier for the client of func and func_terms:
namespace detail {
enum class FuncOption {
WithTerms,
WithoutTerms
};
std::tuple<std::vector<double>, double> func_impl(FuncOption option) {
auto const withTerms = option == FuncOption::WithTerms;
double sum = 0.;
std::vector<double> terms(withTerms ? 100 : 0);
for (int i = 0; i < 100; ++i) {
auto const result = f(i);
if (withTerms) {
terms[i] = result;
}
sum += result;
}
return std::make_tuple(terms, sum);
}
}
double func() {
using namespace detail;
return std::get<double>(func_impl(FuncOption::WithTerms));
}
std::tuple<std::vector<double>, double> func_terms() {
using namespace detail;
return func_impl(FuncOption::WithoutTerms);
}
Whether that's too low-tech is up to you and depends on your exact problem.
Here was a solution that suggested to pass an optional pointer to vector and to fill it only if present. I deleted it as other answers mention it as well and as the latter solution looks much more elegant.
You can abstract your calculation to iterators, so callers remain very simple and no code is copied:
auto make_transform_counting_iterator(int i) {
return boost::make_transform_iterator(
boost::make_counting_iterator(i),
f);
}
auto my_begin() {
return make_transform_counting_iterator(0);
}
auto my_end() {
return make_transform_counting_iterator(100);
}
double only_sum() {
return std::accumulate(my_begin(), my_end(), 0.0);
}
std::vector<double> fill_terms() {
std::vector<double> result;
std::copy(my_begin(), my_end(), std::back_inserter(result));
return result;
}
One of the simple way is to write a common function and use input parameter to do condition. Like this:
double logic(vector<double>* terms) {
double sum = 0.;
for (int i=0; i<100; i++) {
if (terms != NULL) {
terms.push_back(i);
}
sum += terms[i];
}
return sum;
}
double func() {
return logic(NULL);
}
pair<vector<double>,double> func_terms() {
vector<double> terms;
double sum = logic(&ret);
return {terms, sum};
}
this method is used in many conditions. The Logic can be very complicated and with many input options. You can use the same logic through different parameters.
But in most cases, We need not that much return values but just different input parameter.
If you are not for:
std::pair<std::vector<double>, double> func_terms() {
std::vector<double> terms(100);
for (int i = 0; i != 100; ++i) {
terms[i] = f(i);
}
return {terms, std::accumulate(terms.begin(), terms.end(), 0.)};
}
then maybe:
template <typename Accumulator>
Accumulator& func_helper(Accumulator& acc) {
for (int i=0; i<100; i++) {
acc(f(i));
}
return acc;
}
double func()
{
double sum = 0;
func_helper([&sum](double d) { sum += d; });
return sum;
}
std::pair<std::vector<double>, double> func_terms() {
double sum = 0.;
std::vector<double> terms;
func_helper([&](double d) {
sum += d;
terms.push_back(d);
});
return {terms, sum};
}
The simplest solution for this situation I think would be something like this:
double f(int x) { return x * x; }
auto terms(int count) {
auto res = vector<double>{};
generate_n(back_inserter(res), count, [i=0]() mutable {return f(i++);});
return res;
}
auto func_terms(int count) {
const auto ts = terms(count);
return make_pair(ts, accumulate(begin(ts), end(ts), 0.0));
}
auto func(int count) {
return func_terms(count).second;
}
Live version.
But this approach gives func() different performance characteristics to your original version. There are ways around this with the current STL but this highlights an area where the STL is not ideal for composability. The Ranges v3 library offers a better approach to composing algorithms for this type of problem and is in the process of standardization for a future version of C++.
In general there is often a tradeoff between composability / reuse and optimal performance. At its best C++ lets us have our cake and eat it too but this is an example where there is work underway to give standard C++ better approaches to handle this sort of situation.
I worked out an OOP solution, where a base class always compute sum and makes the current term available to derived classes, this way:
class Func
{
public:
Func() { sum = 0.; }
void func()
{
for (int i=0; i<100; i++)
{
double term = f(i);
sum += term;
useCurrentTerm(term);
}
}
double getSum() const { return sum; }
protected:
virtual void useCurrentTerm(double) {} //do nothing
private:
double f(double d){ return d * 42;}
double sum;
};
So a derived class can implement the virtual method and espose extra properties (other than sum):
class FuncWithTerms : public Func
{
public:
FuncWithTerms() { terms.reserve(100); }
std::vector<double> getTerms() const { return terms; }
protected:
void useCurrentTerm(double t) { terms.push_back(t); }
private:
std::vector<double> terms;
};
If one doesn't want to expose these classes, could fall back to functions and use them as a façade (yet two functions, but very manageable, now):
double sum_only_func()
{
Func f;
f.func();
return f.getSum();
}
std::pair<std::vector<double>, double> with_terms_func()
{
FuncWithTerms fwt;
fwt.func();
return { fwt.getTerms(), fwt.getSum() };
}

How to add a vector to a vector of vectors inside functions in c++

I tried navily to add a vector to a vector of vectors inside a function:
the function gets a vector of vectors as a pointer (called r) and it adds to r another vector.
std::vector<double> b;
b.push_back(0);
b.push_back(0);
b.push_back(0);
b.push_back(v0*sin(theta));
b.push_back(0);
b.push_back(v0*cos(theta));
b.push_back(0);
cout<<"befor"<<endl;
(*r).push_back(b);
cout<<"after"<<endl;
but the function crashes when it tries to add b to r. I believe it's because b is an internal object, but why the programe crashes when it tries to add b and not after (when it ends the function) ? to check this I added a print line "before" and "after" (it prints "before" but not "after" and then it crashes).
Can anyone explane me what it the best way to add a vectors to a vector of vectors inside a function?
The fully function looks like this:
void solve_all(double v0,double theta,double omega,double phi,double
dt_in,double total_time,std::string solver, double eps_in,
std::vector<std::vector<double>>* r)
{
double delta,eta,miu,diff,time,eps_f,dt_f,eps_t;
std::vector<double> r_1(7),r_2(7),r_temp(7);
int i,good;
mone_f=0;
std::vector<double> b;
b.push_back(0);
b.push_back(0);
b.push_back(0);
b.push_back(v0*sin(theta));
b.push_back(0);
b.push_back(v0*cos(theta));
b.push_back(0);
cout<<"befor"<<endl;
(*r).push_back(b);
cout<<"after"<<endl;
i=1;
time=b[6];
dt_f=dt_in;
eps_f=eps_in;
delta=eps_f*dt_f/(total_time-time);
eta=delta/4;
miu=delta/2;
eps_t=1e-10;
while(total_time-time>eps_t)
{
cout<<time<<endl;
good=0;
while(good==0)
{
if(dt_f>total_time-time){dt_f=total_time-time;}
solve(&b,&r_1,dt_f,omega,phi,solver);
solve(&b,&r_temp,dt_f/2,omega,phi,solver);
solve(&r_temp,&r_2,dt_f/2,omega,phi,solver);
diff=0;
for(int j=0; j<=5; j++)
{
double subs;
subs=r_2[j]-r_1[j];
if(subs<0){subs=subs*(-1);}
if(subs>diff){diff=subs;}
}
if(diff>delta)
{
dt_f=dt_f/2;
}
else
{
if(miu<diff&&diff<delta)
{
good=1;
dt_f=0.9*dt_f;
}
else
{
if(eta<diff&&diff<miu)
{
good=1;
}
else
{
if(eta>diff)
{
good=1;
dt_f=1.1*dt_f;
}
}
}
}
i++;
b.clear();
if(solver.compare("RK2"))
{
for(int j=0; j<=6;i++)
{
b.push_back((4*r_2[j]-r_1[j])/3);
}
}
if(solver.compare("RK3"))
{
for(int j=0; j<=6;i++)
{
b.push_back((8*r_2[j]-r_1[j])/7);
}
}
(*r).push_back(b);
eps_f=eps_f-diff;
time=b[6];
delta=eps_f*dt_f/(total_time-time);
eta=delta/4;
miu=delta/2;
}
}
}
related functions:
double big_f(double v)
{
return (0.0039+0.0058/(1+exp((v-35)/5)));
}
double f1(double vx,double vy,double vz,double omega ,double phi)
{
double v;
mone_f++;
v=sqrt(pow(vx,2)+pow(vy,2)+pow(vz,2));
return (-big_f(v)*v*vx+B*omega*(vz*sin(phi)-vy*cos(phi)));
}
double f2(double vx,double vy,double vz,double omega ,double phi)
{
double v;
mone_f++;
v=sqrt(pow(vx,2)+pow(vy,2)+pow(vz,2));
return (-big_f(v)*v*vy+B*omega*(vx*cos(phi)));
}
double f3(double vx,double vy,double vz,double omega ,double phi)
{
double v;
mone_f++;
v=sqrt(pow(vx,2)+pow(vy,2)+pow(vz,2));
return (-g-big_f(v)*v*vz-B*omega*(vx*sin(phi)));
}
void solve(std::vector<double>* r,std::vector<double>* r_new,double
l,double omega, double phi, std::string solver)
{
double vx,vy,vz,vx2,vy2,vz2,vx3,vy3,vz3;
if (solver.compare("RK2")==0)
{
vx=(*r)[3];
vy=(*r)[4];
vz=(*r)[5];
vx2=vx+0.5*l*f1(vx,vy,vz,omega,phi);
vy2=vy+0.5*l*f2(vx,vy,vz,omega,phi);
vz2=vz+0.5*l*f3(vx,vy,vz,omega,phi);
(*r_new).push_back((*r)[0]+l*vx2);
(*r_new).push_back((*r)[1]+l*vy2);
(*r_new).push_back((*r)[2]+l*vz2);
(*r_new).push_back((*r)[3]+l*f1(vx2,vy2,vz2,omega,phi));
(*r_new).push_back((*r)[4]+l*f2(vx2,vy2,vz2,omega,phi));
(*r_new).push_back((*r)[5]+l*f3(vx2,vy2,vz2,omega,phi));
(*r_new).push_back((*r)[6]+l);
}
if(solver.compare("RK3")==0)
{
vx=(*r)[3];
vy=(*r)[4];
vz=(*r)[5];
vx2=vx+0.5*l*f1(vx,vy,vz,omega,phi);
vy2=vy+0.5*l*f2(vx,vy,vz,omega,phi);
vz2=vz+0.5*l*f3(vx,vy,vz,omega,phi);
vx3=vx+2*l*f1(vx2,vy2,vz2,omega,phi)-l*f1(vx,vy,vz,omega,phi);
vy3=vy+2*l*f2(vx2,vy2,vz2,omega,phi)-l*f2(vx,vy,vz,omega,phi);
vz3=vz+2*l*f3(vx2,vy2,vz2,omega,phi)-l*f3(vx,vy,vz,omega,phi);
(*r_new).push_back((*r)[0]+l*(vx+4*vx2+vx3)/6);
(*r_new).push_back((*r)[1]+l*(vx+4*vx2+vx3)/6);
(*r_new).push_back((*r)[2]+l*(vx+4*vx2+vx3)/6);
(*r_new).push_back((*r)[3]+l(f1(vx,vy,vz,omega,phi)+4*f1(vx2,vy2,vz2,omega,phi)+f1(vx3,vy3,vz3,omega,phi))/6);
(*r_new).push_back((*r)[4]+l*(f2(vx,vy,vz,omega,phi)+4*f2(vx2,vy2,vz2,omega,phi)+f2(vx3,vy3,vz3,omega,phi))/6);
(*r_new).push_back((*r)[5]+l*(f3(vx,vy,vz,omega,phi)+4*f3(vx2,vy2,vz2,omega,phi)+f3(vx3,vy3,vz3,omega,phi))/6);
(*r_new).push_back((*r)[6]+l);
}
}
If you take a reference std::vector<std::vector<double>>& or pointer std::vector<std::vector<double>>* you can push_back as many and any std::vector<double> you want. It doesn't matter if you're pushing back stack variables, push_back creates a copy.
Your problem is probably that your r pointer is invalid. Try checking that it isn't a nullptr, or if it was pointing to a stack variable that fell out of scope - in this case it does matter. Alternatively, and preferably, change your function definition to take a reference and see if the compiler complains.

convert from recursive to iterative function cuda c++

I'm working on a genetic program in which I am porting some of the heavy lifting into CUDA. (Previously just OpenMP).
It's not running very fast, and I'm getting an error related to the recursion:
Stack size for entry function '_Z9KScoreOnePdPiS_S_P9CPPGPNode' cannot be statically determined
I've added a lump of the logic which runs on CUDA. I believe its enough to show how its working. I'd be happy to hear about other optimizations I could add, but I would really like to take the recursion if it will speed things up.
Examples on how this could be achieved are very welcome.
__device__ double Fadd(double a, double b) {
return a + b;
};
__device__ double Fsubtract(double a, double b) {
return a - b;
};
__device__ double action (int fNo, double aa , double bb, double cc, double dd) {
switch (fNo) {
case 0 :
return Fadd(aa,bb);
case 1 :
return Fsubtract(aa,bb);
case 2 :
return Fmultiply(aa,bb);
case 3 :
return Fdivide(aa,bb);
default:
return 0.0;
}
}
__device__ double solve(int node,CPPGPNode * dev_m_Items,double * var_set) {
if (dev_m_Items[node].is_terminal) {
return var_set[dev_m_Items[node].tNo];
} else {
double values[4];
for (unsigned int x = 0; x < 4; x++ ) {
if (x < dev_m_Items[node].fInputs) {
values[x] = solve(dev_m_Items[node].children[x],dev_m_Items,var_set);
} else {
values[x] = 0.0;
}
}
return action(dev_m_Items[node].fNo,values[0],values[1],values[2],values[3]);
}
}
__global__ void KScoreOne(double *scores,int * root_nodes,double * targets,double * cases,CPPGPNode * dev_m_Items) {
int pid = blockIdx.x;
// We only work if this node needs to be calculated
if (root_nodes[pid] != -1) {
for (unsigned int case_no = 0; case_no < FITNESS_CASES; case_no ++) {
double result = solve(root_nodes[pid],dev_m_Items,&cases[case_no]);
double target = targets[case_no];
scores[pid] += abs(result - target);
}
}
}
I'm having trouble making any stack examples work for a large tree structure, which is what this solves.
I've solved this issue now. It was not quite a case of placing the recursive arguments into a stack but it was a very similar system.
As part of the creation of the node tree, I append each node each to into a vector. I now solve the problem in reverse using http://en.wikipedia.org/wiki/Reverse_Polish_notation, which fits very nicely as each node contains either a value or a function to perform.
It's also ~20% faster than the recursive version, so I'm pleased!