Fixing Neural Net vanishing gradients problem? - c++

This is going to be a long one. I am still very new to coding, started 3 months ago so I know my code is not perfect, any criticism beyond the question is more than welcome. I have specifically avoided using pointers because I do not fully understand them, I can use them but I dont trust that I will use them correctly in a program like this.
First things first, I have a version of this where there is only 1 hidden layer and the net works perfectly. I have started running into problems since I tried to expand the number of hidden layers.
Some info on the net:
-I am using softmax output activation as I have 3 output neurons.
-I am using tanh as my activation function on the rest of the net.
-The file being read for the input has a format of
"input: 0.56 0.76 0.23 0.67"
"output: 0.0 0.0 1.0" (this is the target)
-The weights for connecting layer 1 neuron to layer 2 neuron are stored in layer 1 one neuron.
-The bias's for each neuron are stored in that neuron.
-The target is 1.0 0.0 0.0 if the sum of the input numbers is below one, 0.0 1.0 0.0 if sum is between 1 and 2, 0.0 0.0 1.0 if sum is above 2.
-using L1 regularization.
Those problems specifically being:
The softmax output values do not move from an relatively equalised range ie:
(position 1 and 2 in the target vector have a roughly 50/50 occurance rate while position 3 less than 3% occurance rate. so by relatively equalised I mean the softmax output generally looks something like
"0.56.... 0.48.... 0.02..." even after 500 epochs.
The weights at the hidden layer closer to inputlayer dont change much at all, which is what i think vanishing gradients are. I might be wrong on this. But the weights at hiddenlayer closest to output are ending up at between -50 & 50 (which i think is okay?)
Things I have tried:
I have tried using Relu, parametric Relu, exponential Relu, but with all of these the softmax output value for neuron 3 keeps rising, the other 2 neurons values keep falling. these values continue their trajectory until either 500 epochs have been reached or they just turn into nans. (I think this is to do with the structure of my code rather than the Relu function itself).
If I set the number of hidden layers above 3 while using relu, it immediately spits out nans, within the first epoch.
The backprop function is pretty long, but this is specifically because I have deconstructed it many times over to try and figure out where I might be mismatching values or something. I do have it in a condensed version but I feel I have a higher chance of being completely off the mark there than I do if I have it deconstructed.
I have included the Relu function code that I used, it is the first time I use it so I might be wrong on that aswell but I dont think so, I have double checked multiple times. The Relu in the code is specifically "Elu" or exponential relu.
here is the code for the net:
#include <iostream>
#include <fstream>
#include <cmath>
#include <vector>
#include <sstream>
#include <random>
#include <string>
#include <iomanip>
double randomt(double x, double y)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(x, y);
return dist(mt);
}
class InputN
{
public:
double val{};
std::vector <double> weights{};
};
class HiddenN
{
public:
double preactval{};
double actval{};
double actvalPD{};
double preactvalpd{};
std::vector <double> weights{};
double bias{};
};
class OutputN
{
public:
double preactval{};
double actval{};
double preactvalpd{};
double bias{};
};
class Net
{
public:
std::vector <InputN> inneurons{};
std::vector <std::vector <HiddenN>> hiddenneurons{};
std::vector <OutputN> outputneurons{};
double lambda{ 0.015 };
double alpha{ 0.02 };
};
double tanhderiv(double val)
{
return 1 - tanh(val) * tanh(val);
}
double Relu(double val)
{
if (val < 0) return 0.01 *(exp(val) - 1);
else return val;
}
double Reluderiv(double val)
{
if (val < 0) return Relu(val) + 0.01;
else return 1;
}
double regularizer(double weight)
{
double absval{};
if (weight < 0) absval = weight - weight - weight;
else if (weight > 0 || weight == 0) absval = weight;
else;
if (absval > 0) return 1;
else if (absval < 0) return -1;
else if (absval == 0) return 0;
else return 2;
}
void feedforward(Net& net)
{
double sum{};
int prevlayer{};
for (size_t Hsize = 0; Hsize < net.hiddenneurons.size(); Hsize++)
{
//std::cout << "in first loop" << '\n';
prevlayer = Hsize - 1;
for (size_t Hel = 0; Hel < net.hiddenneurons[Hsize].size(); Hel++)
{
//std::cout << "in second loop" << '\n';
if (Hsize == 0)
{
//std::cout << "in first if" << '\n';
for (size_t Isize = 0; Isize < net.inneurons.size(); Isize++)
{
//std::cout << "in fourth loop" << '\n';
sum += (net.inneurons[Isize].val * net.inneurons[Isize].weights[Hel]);
}
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
sum = 0;
//std::cout << "first if done" << '\n';
}
else
{
//std::cout << "in else" << '\n';
for (size_t prs = 0; prs < net.hiddenneurons[prevlayer].size(); prs++)
{
//std::cout << "in fourth loop" << '\n';
sum += net.hiddenneurons[prevlayer][prs].actval * net.hiddenneurons[prevlayer][prs].weights[Hel];
}
//std::cout << "fourth loop done" << '\n';
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
//std::cout << "else done" << '\n';
sum = 0;
}
}
}
//std::cout << "first loop done " << '\n';
int lasthid = net.hiddenneurons.size() - 1;
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
for (size_t Hsize = 0; Hsize < net.hiddenneurons[lasthid].size(); Hsize++)
{
sum += (net.hiddenneurons[lasthid][Hsize].actval * net.hiddenneurons[lasthid][Hsize].weights[Osize]);
}
net.outputneurons[Osize].preactval = net.outputneurons[Osize].bias + sum;
}
}
void softmax(Net& net)
{
double sum{};
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
sum += exp(net.outputneurons[Osize].preactval);
}
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
net.outputneurons[Osize].actval = exp(net.outputneurons[Osize].preactval) / sum;
}
}
void lossfunc(Net& net, std::vector <double> target)
{
int pos{ -1 };
double val{};
for (size_t t = 0; t < target.size(); t++)
{
pos += 1;
if (target[t] > 0)
{
break;
}
}
for (size_t s = 0; net.outputneurons.size(); s++)
{
val = -log(net.outputneurons[pos].actval);
}
}
void backprop(Net& net, std::vector<double>& target)
{
for (size_t outI = 0; outI < net.outputneurons.size(); outI++)
{
double PD = target[outI] - net.outputneurons[outI].actval;
net.outputneurons[outI].preactvalpd = PD * -1;
}
size_t lasthid = net.hiddenneurons.size() - 1;
for (size_t LH = 0; LH < net.hiddenneurons[lasthid].size(); LH++)
{
for (size_t LHW = 0; LHW < net.hiddenneurons[lasthid][LH].weights.size(); LHW++)
{
double weight = net.hiddenneurons[lasthid][LH].weights[LHW];
double PD = net.outputneurons[LHW].preactvalpd * net.hiddenneurons[lasthid][LH].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[lasthid][LH].weights[LHW] = weight;
}
}
for (size_t OB = 0; OB < net.outputneurons.size(); OB++)
{
double bias = net.outputneurons[OB].bias;
double BPD = net.outputneurons[OB].preactvalpd;
BPD = BPD * -1;
double Delta = BPD;
bias = bias + (net.alpha * Delta);
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
double PD{};
for (size_t HW = 0; HW < net.outputneurons.size(); HW++)
{
PD += net.hiddenneurons[lasthid][HPD].weights[HW] * net.outputneurons[HW].preactvalpd;
}
net.hiddenneurons[lasthid][HPD].actvalPD = PD;
PD = 0;
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
net.hiddenneurons[lasthid][HPD].preactvalpd = net.hiddenneurons[lasthid][HPD].actvalPD * tanhderiv(net.hiddenneurons[lasthid][HPD].preactval);
}
for (size_t AllHid = net.hiddenneurons.size() - 2; AllHid > -1; AllHid--)
{
size_t uplayer = AllHid + 1;
for (size_t cl = 0; cl < net.hiddenneurons[AllHid].size(); cl++)
{
for (size_t clw = 0; clw < net.hiddenneurons[AllHid][cl].weights.size(); clw++)
{
double weight = net.hiddenneurons[AllHid][cl].weights[clw];
double PD = net.hiddenneurons[uplayer][clw].preactvalpd * net.hiddenneurons[AllHid][cl].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[AllHid][cl].weights[clw] = weight;
}
}
for (size_t up = 0; up < net.hiddenneurons[uplayer].size(); up++)
{
double bias = net.hiddenneurons[uplayer][up].bias;
double PD = net.hiddenneurons[uplayer][up].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
}
for (size_t APD = 0; APD < net.hiddenneurons[AllHid].size(); APD++)
{
double PD{};
for (size_t APDW = 0; APDW < net.hiddenneurons[AllHid][APD].weights.size(); APDW++)
{
PD += net.hiddenneurons[AllHid][APD].weights[APDW] * net.hiddenneurons[uplayer][APDW].preactvalpd;
}
net.hiddenneurons[AllHid][APD].actvalPD = PD;
PD = 0;
}
for (size_t PPD = 0; PPD < net.hiddenneurons[AllHid].size(); PPD++)
{
double PD = net.hiddenneurons[AllHid][PPD].actvalPD * tanhderiv(net.hiddenneurons[AllHid][PPD].preactval);
net.hiddenneurons[AllHid][PPD].preactvalpd = PD;
}
}
for (size_t IN = 0; IN < net.inneurons.size(); IN++)
{
for (size_t INW = 0; INW < net.inneurons[IN].weights.size(); INW++)
{
double weight = net.inneurons[IN].weights[INW];
double PD = net.hiddenneurons[0][INW].preactvalpd * net.inneurons[IN].val;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.inneurons[IN].weights[INW] = weight;
}
}
for (size_t hidB = 0; hidB < net.hiddenneurons[0].size(); hidB++)
{
double bias = net.hiddenneurons[0][hidB].bias;
double PD = net.hiddenneurons[0][hidB].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
net.hiddenneurons[0][hidB].bias = bias;
}
}
int main()
{
std::vector <double> invals{ };
std::vector <double> target{ };
Net net;
InputN Ineuron;
HiddenN Hneuron;
OutputN Oneuron;
int IN = 4;
int HIDLAYERS = 4;
int HID = 8;
int OUT = 3;
for (int i = 0; i < IN; i++)
{
net.inneurons.push_back(Ineuron);
for (int m = 0; m < HID; m++)
{
net.inneurons.back().weights.push_back(randomt(0.0, 0.5));
}
}
//std::cout << "first loop done" << '\n';
for (int s = 0; s < HIDLAYERS; s++)
{
net.hiddenneurons.push_back(std::vector <HiddenN>());
if (s == HIDLAYERS - 1)
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < OUT; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
else
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < HID; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
}
//std::cout << "second loop done" << '\n';
for (int i = 0; i < OUT; i++)
{
net.outputneurons.push_back(Oneuron);
net.outputneurons.back().bias = randomt(0.0, 0.5);
}
//std::cout << "third loop done" << '\n';
int count{};
std::ifstream fileread("N.txt");
for (int epoch = 0; epoch < 500; epoch++)
{
count = 0;
if (epoch == 100 || epoch == 100 * 2 || epoch == 100 * 3 || epoch == 100 * 4 || epoch == 499)
{
printvals("no", net);
}
fileread.clear(); fileread.seekg(0, std::ios::beg);
while (fileread.is_open())
{
std::cout << '\n' << "epoch: " << epoch << '\n';
std::string fileline{};
fileread >> fileline;
if (fileline == "in:")
{
std::string input{};
double nums{};
std::getline(fileread, input);
std::stringstream ss(input);
while (ss >> nums)
{
invals.push_back(nums);
}
}
if (fileline == "out:")
{
std::string output{};
double num{};
std::getline(fileread, output);
std::stringstream ss(output);
while (ss >> num)
{
target.push_back(num);
}
}
count += 1;
if (count == 2)
{
for (size_t inv = 0; inv < invals.size(); inv++)
{
net.inneurons[inv].val = invals[inv];
}
//std::cout << "calling feedforward" << '\n';
feedforward(net);
//std::cout << "ff done" << '\n';
softmax(net);
printvals("output", net);
std::cout << "target: " << '\n';
for (auto element : target) std::cout << element << " / ";
std::cout << '\n';
backprop(net, target);
invals.clear();
target.clear();
count = 0;
}
if (fileread.eof()) break;
}
}
//std::cout << "fourth loop done" << '\n';
return 1;
}
Much aprecciated to anyone who actually made it through all that! :)

Related

Implementation of SGD with momentum slows net down

I've been working on a neural net class that I can later turn into a library of my own. Primarily doing this to get a good understanding of nets and I've been reading all the formulas from pure maths lectures so I might have a few small details wrong. (I had no idea how to before I started this)
In this net I have coded in a normal SGD algorithm and then an momentum algorithm (or atleast what I think it is).
When I run the net on my simple data set using SGD, it works perfectly, no problems at all. But if I try using SGD with momentum, the net does not learn at all, even after 10000 iterations the loss stays around 0.7.
I have been back and forth, referencing the formula from many places and while I still doubt I completely understand, I feel like it is definitely something with my code but I cant figure it out. I have tried many combinations of alpha and lambda values, many reasonable combinations of layers and neurons(specifically more than one hidden layer with momentum formula, but it doesnt work with 1 layer either).
I am going to post the code for the full net, so if anyone is willing to just scan through it quick and see if there's anything that seems obviously wrong, that would be much appreciated. I feel the fault might ly in updateweights() function since that is where most of the calculation happens but it could also be in the calcema() function.
I have tried changing the weight update formula from W = W - (alpha * partial derivative) to W = W + ( alpha * PD) (and keeping the PD positive instead of making it negative), also tried removing the regularizer for the momentum update formula but none of it has actually made a difference.
I am still very new to this, trying my best so any feedback is appreciated.
Here is a sample from the input file:
in: 0.6 0.34 0.32 0.78
out: 1.0 0.0 0.0
in: 0.36 0.52 0.75 0.67
out: 1.0 0.0 0.0
in: 0.29 0.034 0.79 0.5
out: 0.0 1.0 0.0
in: 0.21 0.29 0.47 0.62
out: 0.0 1.0 0.0
in: 0.67 0.57 0.42 0.19
out: 0.0 1.0 0.0
in: 0.48 0.22 0.79 0.0096
out: 0.0 1.0 0.0
in: 0.75 0.48 0.61 0.67
out: 1.0 0.0 0.0
in: 0.41 0.96 0.65 0.074
out: 1.0 0.0 0.0
in: 0.19 0.88 0.68 0.1
out: 0.0 1.0 0.0
in: 0.9 0.89 0.95 0.45
out: 1.0 0.0 0.0
in: 0.71 0.58 0.95 0.013
out: 1.0 0.0 0.0
in: 0.66 0.043 0.073 0.98
out: 0.0 1.0 0.0
in: 0.12 0.37 0.2 0.22
out: 0.0 0.0 1.0
in: 0.11 0.38 0.54 0.64
out: 0.0 1.0 0.0
in: 0.42 0.81 0.94 0.98
out: 1.0 0.0 0.0
if anyone would like the full input file, let me know, I just dont know how to post files on here but I will find a way.
So my problem specifically is that when I use SGD with momentum (or what I think is SGD with momentum), my net does not learn at all and gets stuck at a loss of 0.7... but if I use normal SGD it works perfectly.
The code:
#include <iostream>
#include <vector>
#include <iomanip>
#include <cmath>
#include <random>
#include <fstream>
#include <chrono>
#include <sstream>
#include <string>
#include <assert.h>
double Relu(double val)
{
if (val < 0) return 0.01 * (exp(val) - 1);
else return val;
}
double Reluderiv(double val)
{
if (val < 0) return Relu(val) + 0.01;
else return 1;
}
double randdist(double x, double y)
{
return sqrt(2.0 / (x + y));
}
int randomt(int x, int y)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(x, y);
return round(dist(mt));
}
class INneuron
{
public:
double val{};
std::vector <double> weights{};
std::vector <double> weightderivs{};
std::vector <double> emavals{};
};
class HIDneuron
{
public:
double preactval{};
double actval{};
double actvalPD{};
double preactvalPD{};
std::vector <double> weights{};
std::vector <double> weightderivs{};
std::vector <double> emavals{};
double bias{};
double biasderiv{};
double biasema{};
};
class OUTneuron
{
public:
double preactval{};
double actval{};
double preactvalPD{};
double bias{};
double biasderiv{};
double biasema{};
};
class Net
{
public:
Net(int netdimensions, int hidlayers, int hidneurons, int outneurons, int inneurons, double lambda, double alpha)
{
NETDIMENSIONS = netdimensions; HIDLAYERS = hidlayers; HIDNEURONS = hidneurons; OUTNEURONS = outneurons; INNEURONS = inneurons; Lambda = lambda; Alpha = alpha;
}
void defineoptimizer(std::string optimizer);
void Feedforward(const std::vector <double>& invec);
void Backprop(const std::vector <double>& targets);
void Updateweights();
void printvalues(double totalloss);
void Initweights();
void softmax();
double regularize(double weight,std::string type);
double lossfunc(const std::vector <double>& target);
void calcema(int Layernum, int neuron, int weight, std::string layer, std::string BorW);
private:
INneuron Inn;
HIDneuron Hidn;
OUTneuron Outn;
std::vector <std::vector <HIDneuron>> Hidlayers{};
std::vector <INneuron> Inlayer{};
std::vector <OUTneuron> Outlayer{};
double NETDIMENSIONS{};
double HIDLAYERS{};
double HIDNEURONS{};
double OUTNEURONS{};
double INNEURONS{};
double Lambda{};
double Alpha{};
double loss{};
int optimizerformula{};
};
void Net::defineoptimizer(std::string optimizer)
{
if (optimizer == "ExpAvrg")
{
optimizerformula = 1;
}
else if (optimizer == "SGD")
{
optimizerformula = 2;
}
else if (optimizer == "Adam")
{
optimizerformula = 3;
}
else if (optimizer == "MinibatchSGD")
{
optimizerformula = 4;
}
else {
std::cout << "no optimizer matching description" << '\n';
abort();
}
}
double Net::regularize(double weight,std::string type)
{
if (type == "L1")
{
double absval{ weight };
/*if (weight < 0) absval = weight * -1;
else if (weight > 0 || weight == 0) absval = weight;
else;*/
if (absval > 0.0) return 1.0;
else if (absval < 0.0) return -1.0;
else if (absval == 0.0) return 0.0;
else return 2;
}
else if (type == "l2")
{
double absval{};
if (weight < 0.0) absval = weight * -1.0;
else absval = weight;
return (2.0 * absval);
}
else { std::cout << "no regularizer recognized" << '\n'; abort(); }
}
void Net::softmax()
{
double sum{};
for (size_t Osize = 0; Osize < Outlayer.size(); Osize++)
{
sum += exp(Outlayer[Osize].preactval);
}
for (size_t Osize = 0; Osize < Outlayer.size(); Osize++)
{
Outlayer[Osize].actval = exp(Outlayer[Osize].preactval) / sum;
}
}
void Net::Initweights()
{
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::normal_distribution<double> distribution(0.0, 1.0);
for (int WD = 0; WD < HIDLAYERS + 1; WD++)
{
if (WD == 0)
{
for (int WL = 0; WL < INNEURONS; WL++)
{
Inlayer.push_back(Inn);
for (int WK = 0; WK < HIDNEURONS; WK++)
{
double val = distribution(generator) * randdist(INNEURONS, HIDNEURONS);
Inlayer.back().weights.push_back(val);
Inlayer.back().weightderivs.push_back(0.0);
Inlayer.back().emavals.push_back(0.0);
}
}
}
else if (WD < HIDLAYERS && WD != 0)
{
Hidlayers.push_back(std::vector <HIDneuron>());
for (int WL = 0; WL < HIDNEURONS; WL++)
{
Hidlayers.back().push_back(Hidn);
for (int WK = 0; WK < HIDNEURONS; WK++)
{
double val = distribution(generator) * randdist(HIDNEURONS, HIDNEURONS);
Hidlayers.back().back().weights.push_back(val);
Hidlayers.back().back().weightderivs.push_back(0.0);
Hidlayers.back().back().emavals.push_back(0.0);
}
Hidlayers.back().back().bias = 0.0;
Hidlayers.back().back().biasderiv = 0.0;
Hidlayers.back().back().biasema = 0.0;
}
}
else if (WD == HIDLAYERS)
{
Hidlayers.push_back(std::vector <HIDneuron>());
for (int WL = 0; WL < HIDNEURONS; WL++)
{
Hidlayers.back().push_back(Hidn);
for (int WK = 0; WK < OUTNEURONS; WK++)
{
double val = distribution(generator) * randdist(HIDNEURONS, OUTNEURONS);
Hidlayers.back().back().weights.push_back(val);
Hidlayers.back().back().weightderivs.push_back(0.0);
Hidlayers.back().back().emavals.push_back(0.0);
}
Hidlayers.back().back().bias = 0.0;
Hidlayers.back().back().biasderiv = 0.0;
Hidlayers.back().back().biasema = 0.0;
}
}
}
for (int i = 0; i < OUTNEURONS; i++)
{
Outlayer.push_back(Outn);
Outlayer.back().bias = 0.0;
Outlayer.back().biasderiv = 0.0;
Outlayer.back().biasema = 0.0;
}
}
void Net::Feedforward(const std::vector <double>& invec)
{
for (size_t I = 0; I < Inlayer.size(); I++)
{
Inlayer[I].val = invec[I];
}
for (size_t h = 0; h < Hidlayers[0].size(); h++)
{
double preval = Hidlayers[0][h].bias;
for (size_t I = 0;I < Inlayer.size(); I++)
{
preval += Inlayer[I].val * Inlayer[I].weights[h];
}
Hidlayers[0][h].preactval = preval;
Hidlayers[0][h].actval = Relu(preval);
}
for (size_t H = 1; H < Hidlayers.size();H++)
{
size_t prevh = H - 1;
for (size_t h = 0; h < Hidlayers[H].size(); h++)
{
double preval = Hidlayers[H][h].bias;
for (size_t p = 0; p < Hidlayers[prevh].size(); p++)
{
preval += Hidlayers[prevh][p].actval * Hidlayers[prevh][p].weights[h];
}
Hidlayers[H][h].preactval = preval;
Hidlayers[H][h].actval = Relu(preval);
}
}
for (size_t O = 0; O < Outlayer.size(); O++)
{
size_t lhid = Hidlayers.size() - 1;
double preval = Outlayer[O].bias;
for (size_t h = 0; h < Hidlayers[lhid].size(); h++)
{
preval += Hidlayers[lhid][h].actval * Hidlayers[lhid][h].weights[O];
}
Outlayer[O].preactval = preval;
}
}
void Net::Backprop(const std::vector <double>& targets)
{
for (size_t O = 0; O < Outlayer.size(); O++)
{
double PDval{};
PDval = targets[O] - Outlayer[O].actval;
PDval = PDval * -1.0;
Outlayer[O].preactvalPD = PDval;
}
for (size_t H = Hidlayers.size(); H > 0; H--)
{
size_t Top = H;
size_t Current = H - 1;
for (size_t h = 0; h < Hidlayers[Current].size(); h++)
{
double actPD{};
double PreactPD{};
double biasPD{};
for (size_t hw = 0; hw < Hidlayers[Current][h].weights.size(); hw++)
{
double PDval{};
if (H == Hidlayers.size())
{
PDval = Outlayer[hw].preactvalPD * Hidlayers[Current][h].actval;
biasPD = Outlayer[hw].preactvalPD;
Outlayer[hw].biasderiv = biasPD;
actPD += Hidlayers[Current][h].weights[hw] * Outlayer[hw].preactvalPD;
calcema(0, hw, 0, "Outlayer", "Bias");
}
else
{
PDval = Hidlayers[Top][h].preactvalPD * Hidlayers[Current][h].actval;
actPD += Hidlayers[Current][h].weights[hw] * Hidlayers[Top][h].preactvalPD;
}
Hidlayers[Current][h].weightderivs[hw] = PDval;
calcema(Current, h, hw, "Hidlayer", "Weight");
}
if (H != Hidlayers.size())
{
biasPD = Hidlayers[Top][h].preactvalPD;
Hidlayers[Top][h].biasderiv = biasPD;
calcema(Top, h, 0, "Hidlayer", "Bias");
}
Hidlayers[Current][h].actvalPD = actPD;
PreactPD = Hidlayers[Current][h].actvalPD * Reluderiv(Hidlayers[Current][h].preactval);
Hidlayers[Current][h].preactvalPD = PreactPD;
actPD = 0;
}
}
for (size_t I = 0; I < Inlayer.size(); I++)
{
double PDval{};
for (size_t hw = 0; hw < Inlayer[I].weights.size(); hw++)
{
PDval = Hidlayers[0][hw].preactvalPD * Inlayer[I].val;
Inlayer[I].weightderivs[hw] = PDval;
double biasPD = Hidlayers[0][hw].preactvalPD;
Hidlayers[0][hw].biasderiv = biasPD;
}
}
}
//PROBABLE CULPRIT
void Net::Updateweights()
{
for (size_t I = 0; I < Inlayer.size(); I++)
{
double PD{};
for (size_t iw = 0; iw < Inlayer[I].weights.size(); iw++)
{
if (optimizerformula == 2)
{
PD = (Inlayer[I].weightderivs[iw] * -1.0) - (Lambda * regularize(Inlayer[I].weights[iw], "L1"));
Inlayer[I].weights[iw] = Inlayer[I].weights[iw] + (Alpha * PD);
}
else if (optimizerformula == 1)
{
PD = (Inlayer[I].emavals[iw] * -1.0) - (Lambda * regularize(Inlayer[I].weights[iw], "L1"));
Inlayer[I].weights[iw] = Inlayer[I].weights[iw] + (Alpha * PD);
}
}
}
for (size_t H = 0; H < Hidlayers.size(); H++)
{
for (size_t h = 0; h < Hidlayers[H].size(); h++)
{
double PD{};
for (size_t hw = 0; hw < Hidlayers[H][h].weights.size(); hw++)
{
if (optimizerformula == 2)
{
PD = (Hidlayers[H][h].weightderivs[hw] * -1.0) - (Lambda * regularize(Hidlayers[H][h].weights[hw], "L1"));
Hidlayers[H][h].weights[hw] = Hidlayers[H][h].weights[hw] + (Alpha * PD);
}
else if (optimizerformula == 1)
{
PD = (Hidlayers[H][h].emavals[hw] * -1.0) - (Lambda * regularize(Hidlayers[H][h].weights[hw], "L1"));
Hidlayers[H][h].weights[hw] = Hidlayers[H][h].weights[hw] + (Alpha * PD);
}
}
if (optimizerformula == 1)
{
PD = Hidlayers[H][h].biasema * -1.0;
Hidlayers[H][h].bias = Hidlayers[H][h].bias + (Alpha * PD);
}
else if (optimizerformula == 2)
{
PD = Hidlayers[H][h].biasderiv * -1.0;
Hidlayers[H][h].bias = Hidlayers[H][h].bias + (Alpha * PD);
}
}
}
for (size_t biases = 0; biases < Outlayer.size(); biases++)
{
if (optimizerformula == 2)
{
double PD = Outlayer[biases].biasderiv * -1.0;
Outlayer[biases].bias = Outlayer[biases].bias + (Alpha * PD);
}
else if (optimizerformula == 1)
{
double PD = Outlayer[biases].biasema * -1.0;
Outlayer[biases].bias = Outlayer[biases].bias + (Alpha * PD);
}
}
}
void Net::printvalues(double totalloss)
{
for (size_t Res = 0; Res < Outlayer.size(); Res++)
{
std::cout << Outlayer[Res].actval << " / ";
}
std::cout << '\n' << "loss = " << totalloss << '\n';
}
double Net::lossfunc(const std::vector <double>& target)
{
int pos{ -1 };
double val{};
for (size_t t = 0; t < target.size(); t++)
{
pos += 1;
if (target[t] > 0)
{
break;
}
}
val = -log(Outlayer[pos].actval);
return val;
}
//OTHER PROBABLE CULPRIT
void Net::calcema(int Layernum, int neuron, int weight, std::string layer, std::string BorW )
{
static double Beta{ 0.9 };
if (BorW == "Weight")
{
if (layer == "Inlayer")
{
Inlayer[neuron].emavals[weight] = (Beta * Inlayer[neuron].emavals[weight]) + ((1.0 - Beta) * Inlayer[neuron].weightderivs[weight]);
}
else if (layer == "Hidlayers")
{
Hidlayers[Layernum][neuron].emavals[weight] = (Beta * Hidlayers[Layernum][neuron].emavals[weight]) + ((1.0 - Beta) * Hidlayers[Layernum][neuron].weightderivs[weight]);
}
}
else if (BorW == "Bias")
{
if (layer == "Hidlayers")
{
Hidlayers[Layernum][neuron].biasema = (Beta * Hidlayers[Layernum][neuron].biasema) + ((1.0 - Beta) * Hidlayers[Layernum][neuron].biasderiv);
}
else if (layer == "Outlayer")
{
Outlayer[neuron].biasema = (Beta * Outlayer[neuron].biasema) + ((1.0 - Beta) * Outlayer[neuron].biasderiv);
}
}
}
int main()
{
std::vector <double> innums{};
std::vector <double> outnums{};
std::vector <std::string> INstrings{};
std::vector <std::string> OUTstrings{};
std::string nums{};
std::string in{};
std::string out{};
double totalloss{};
double loss{};
double single{};
int batchcount{0};
Net net(0, 2, 4, 3, 4, 0.0001, 0.006);
net.Initweights();
net.defineoptimizer("ExpAvrg");
std::ifstream file("N.txt");
while (file.is_open())
{
int count{ 0 };
while (file >> nums)
{
if (nums == "in:")
{
count += 1;
std::getline(file, in);
INstrings.push_back(in);
}
else if (nums == "out:")
{
count += 1;
std::getline(file, out);
OUTstrings.push_back(out);
}
else;
}
break;
}
for (int epoch = 0; epoch < 50000; epoch++)
{
int random = randomt(0, 99);
std::string invals = INstrings[random];
std::string outvals = OUTstrings[random];
std::stringstream in(invals);
std::stringstream out(outvals);
std::cout << "fetching" << '\n';
while (in >> single)
{
innums.push_back(single);
}
while (out >> single)
{
outnums.push_back(single);
}
std::cout << "epoch " << epoch << '\n';
std::cout << "In nums: " << '\n';
for (auto element : innums) std::cout << element << " / ";
std::cout << '\n' << "targets: " << '\n';
for (auto element : outnums) std::cout << element << " / ";
std::cout << '\n';
batchcount += 1;
net.Feedforward(innums);
net.softmax();
loss += net.lossfunc(outnums);
totalloss = loss / batchcount;
net.printvalues(totalloss);
net.Backprop(outnums);
net.Updateweights();
innums.clear();
outnums.clear();
}
std::cout << "in size: "<< INstrings.size() << '\n';
std::cout << "out size: " << OUTstrings.size() << '\n';
}
So, to anyone that might be interested in this. I found the answer.
In my resource, the formula for SGD with momentum is:
Momentumgradient = partial derivative of weight + (beta * previous momentumgradient);
What I was doing wrong was I was assuming that I was doing that calculation in my calcema() function and then I just took the value calculated in calcema() and plugged it into a normal SGD formula. replacing weightderivative value with the momentum gradient value.
The fix to this problem was to simply do exactly as the formula says (feel stupid now).
which is:
In updateweights():
//previous formula
else if (optimizerformula == 1)
{
PD = (Hidlayers[H][h].emavals[hw] * -1.0) - (Lambda * regularize(Hidlayers[H][h].weights[hw], "L1"));
Hidlayers[H][h].weights[hw] = Hidlayers[H][h].weights[hw] + (Alpha * PD);
}
//update formula
else if (optimizerformula == 1)
{
PD = ((Hidlayers[H][h].weightderivs[hw] + (0.9 *Hidlayers[H][h].emavals[hw])) * -1.0) - (Lambda * regularize(Hidlayers[H][h].weights[hw], "L1"));
Hidlayers[H][h].weights[hw] = Hidlayers[H][h].weights[hw] + (Alpha * PD);
}

Logistic Regression Returning Wrong Prediction

I'm trying to implement logistic regression in C++, but the predictions I'm getting are not even close to what I am expecting. I'm not sure if there is an error in my understanding of logistic regression or the code.
I have reviewed the algorithms and messed with the learning rate, but the results are very inconsistent.
double theta[4] = {0,0,0,0};
double x[2][3] = {
{1,1,1},
{9,9,9},
};
double y[2] = {0,1};
//prediction data
double test_x[1][3] = {
{9,9,9},
};
int test_m = sizeof(test_x) / sizeof(test_x[0]);
int m = sizeof(x) / sizeof(x[0]);
int n = sizeof(theta) / sizeof(theta[0]);
int xn = n - 1;
struct Logistic
{
double sigmoid(double total)
{
double e = 2.71828;
double sigmoid_x = 1 / (1 + pow(e, -total));
return sigmoid_x;
}
double h(int x_row)
{
double total = theta[0] * 1;
for(int c1 = 0; c1 < xn; ++c1)
{
total += theta[c1 + 1] * x[x_row][c1];
}
double final_total = sigmoid(total);
//cout << "final total: " << final_total;
return final_total;
}
double cost()
{
double hyp;
double temp_y;
double error;
for(int c1 = 0; c1 < m; ++c1)
{
//passes row of x to h to calculate sigmoid(xi * thetai)
hyp = h(c1);
temp_y = y[c1];
error += temp_y * log(hyp) + (1 - temp_y) * log(1 - hyp);
}// 1 / m
double final_error = -.5 * error;
return final_error;
}
void gradient_descent()
{
double alpha = .01;
for(int c1 = 0; c1 < n; ++c1)
{
double error = cost();
cout << "final error: " << error << "\n";
theta[c1] = theta[c1] - alpha * error;
cout << "theta: " << c1 << " " << theta[c1] << "\n";
}
}
void train()
{
for(int epoch = 0; epoch <= 10; ++epoch)
{
gradient_descent();
cout << "epoch: " << epoch << "\n";
}
}
vector<double> predict()
{
double temp_total;
double total;
vector<double> final_total;
//hypothesis equivalent function
temp_total = theta[0] * 1;
for(int c1 = 0; c1 < test_m; ++c1)
{
for(int c2 = 0; c2 < xn; ++c2)
{
temp_total += theta[c2 + 1] * test_x[c1][c2];
}
total = sigmoid(temp_total);
//cout << "final total: " << final_total;
final_total.push_back(total);
}
return final_total;
}
};
int main()
{
Logistic test;
test.train();
vector<double> prediction = test.predict();
for(int c1 = 0; c1 < test_m; ++c1)
{
cout << "prediction: " << prediction[c1] << "\n";
}
}
start with a very small learning rate wither larger iteration number at try. Haven`t tested ur code. But I guess the cost/error/energy jumps from hump to hump.
Somewhat unrelated to your question, but rather than computing e^-total using pow, use exp instead (it's a hell of a lot faster!). Also there is no need to make the sigmoid function a member func, make it static or just a normal C func (it doesn't require any member variable from your struct).
static double sigmoid(double total)
{
return 1.0 / (1.0 + exp(-total));
}

Matrix inversion slower using threads

I made a function that makes the inverse and then another multithreaded, as long I have to make inverse of arrays >2000 x 2000.
A 1000x1000 array unthreated takes 2.5 seconds (on a i5-4460 4 cores 2.9ghz)
and multithreaded takes 7.25 seconds
I placed the multithreads in the part that most time consumption is taken. Whai is wrong?
Is due vectors are used instead of 2 dimensions arrays?
This is the minimum code to test both versions:
#include<iostream>
#include <vector>
#include <stdlib.h>
#include <time.h>
#include <chrono>
#include <thread>
const int NUCLEOS = 8;
#ifdef __linux__
#include <unistd.h> //usleep()
typedef std::chrono::system_clock t_clock; //try to use high_resolution_clock on new linux x64 computer!
#else
typedef std::chrono::high_resolution_clock t_clock;
#pragma warning(disable:4996)
#endif
using namespace std;
std::chrono::time_point<t_clock> start_time, stop_time = start_time; char null_char = '\0';
void timer(char *title = 0, int data_size = 1) { stop_time = t_clock::now(); double us = (double)chrono::duration_cast<chrono::microseconds>(stop_time - start_time).count(); if (title) printf("%s time = %7lgms = %7lg MOPs\n", title, (double)us*1e-3, (double)data_size / us); start_time = t_clock::now(); }
//makes columns 0
void colum_zero(vector< vector<double> > &x, vector< vector<double> > &y, int pos0, int pos1,int dim, int ord);
//returns inverse of x, x is not modified, not threaded
vector< vector<double> > inverse(vector< vector<double> > x)
{
if (x.size() != x[0].size())
{
cout << "ERROR on inverse() not square array" << endl; getchar(); return{};//returns a null
}
size_t dim = x.size();
int i, j, ord;
vector< vector<double> > y(dim,vector<double>(dim,0));//initializes output = 0
//init_2Dvector(y, dim, dim);
//1. Unity array y:
for (i = 0; i < dim; i++)
{
y[i][i] = 1.0;
}
double diagon, coef;
double *ptrx, *ptry, *ptrx2, *ptry2;
for (ord = 0; ord<dim; ord++)
{
//2 Hacemos diagonal de x =1
int i2;
if (fabs(x[ord][ord])<1e-15) //If that element is 0, a line that contains a non zero is added
{
for (i2 = ord + 1; i2<dim; i2++)
{
if (fabs(x[i2][ord])>1e-15) break;
}
if (i2 >= dim)
return{};//error, returns null
for (i = 0; i<dim; i++)//added a line without 0
{
x[ord][i] += x[i2][i];
y[ord][i] += y[i2][i];
}
}
diagon = 1.0/x[ord][ord];
ptry = &y[ord][0];
ptrx = &x[ord][0];
for (i = 0; i < dim; i++)
{
*ptry++ *= diagon;
*ptrx++ *= diagon;
}
//uses the same function but not threaded:
colum_zero(x,y,0,dim,dim,ord);
}//end ord
return y;
}
//threaded version
vector< vector<double> > inverse_th(vector< vector<double> > x)
{
if (x.size() != x[0].size())
{
cout << "ERROR on inverse() not square array" << endl; getchar(); return{};//returns a null
}
int dim = (int) x.size();
int i, ord;
vector< vector<double> > y(dim, vector<double>(dim, 0));//initializes output = 0
//init_2Dvector(y, dim, dim);
//1. Unity array y:
for (i = 0; i < dim; i++)
{
y[i][i] = 1.0;
}
std::thread tarea[NUCLEOS];
double diagon;
double *ptrx, *ptry;// , *ptrx2, *ptry2;
for (ord = 0; ord<dim; ord++)
{
//2 Hacemos diagonal de x =1
int i2;
if (fabs(x[ord][ord])<1e-15) //If a diagonal element=0 it is added a column that is not 0 the diagonal element
{
for (i2 = ord + 1; i2<dim; i2++)
{
if (fabs(x[i2][ord])>1e-15) break;
}
if (i2 >= dim)
return{};//error, returns null
for (i = 0; i<dim; i++)//It is looked for a line without zero to be added to make the number a non zero one to avoid later divide by 0
{
x[ord][i] += x[i2][i];
y[ord][i] += y[i2][i];
}
}
diagon = 1.0 / x[ord][ord];
ptry = &y[ord][0];
ptrx = &x[ord][0];
for (i = 0; i < dim; i++)
{
*ptry++ *= diagon;
*ptrx++ *= diagon;
}
int pos0 = 0, N1 = dim;//initial array position
if ((N1<1) || (N1>5000))
{
cout << "It is detected out than 1-5000 simulations points=" << N1 << " ABORT or press enter to continue" << endl; getchar();
}
//cout << "Initiation of " << NUCLEOS << " threads" << endl;
for (int thread = 0; thread<NUCLEOS; thread++)
{
int pos1 = (int)((thread + 1)*N1 / NUCLEOS);//next position
tarea[thread] = std::thread(colum_zero, std::ref(x), std::ref(y), pos0, pos1, dim, ord);//ojo, coil current=1!!!!!!!!!!!!!!!!!!
pos0 = pos1;//next thread will work at next point
}
for (int thread = 0; thread<NUCLEOS; thread++)
{
tarea[thread].join();
//cout << "Thread num: " << thread << " end\n";
}
}//end ord
return y;
}
//makes columns 0
void colum_zero(vector< vector<double> > &x, vector< vector<double> > &y, int pos0, int pos1,int dim, int ord)
{
double coef;
double *ptrx, *ptry, *ptrx2, *ptry2;
//Hacemos '0' la columna ord salvo elemento diagonal:
for (int i = pos0; i<pos1; i++)//Begin to end for every thread
{
if (i == ord) continue;
coef = x[i][ord];//element to make 0
if (fabs(coef)<1e-15) continue; //If already zero, it is avoided
ptry = &y[i][0];
ptry2 = &y[ord][0];
ptrx = &x[i][0];
ptrx2 = &x[ord][0];
for (int j = 0; j < dim; j++)
{
*ptry++ = *ptry - coef * (*ptry2++);//1ª matriz
*ptrx++ = *ptrx - coef * (*ptrx2++);//2ª matriz
}
}
}
void test_6_inverse(int dim)
{
vector< vector<double> > vec1(dim, vector<double>(dim));
for (int i=0;i<dim;i++)
for (int j = 0; j < dim; j++)
{
vec1[i][j] = (-1.0 + 2.0*rand() / RAND_MAX) * 10000;
}
vector< vector<double> > vec2,vec3;
double ini, end;
ini = (double)clock();
vec2 = inverse(vec1);
end = (double)clock();
cout << "=== Time inverse unthreaded=" << (end - ini) / CLOCKS_PER_SEC << endl;
ini=end;
vec3 = inverse_th(vec1);
end = (double)clock();
cout << "=== Time inverse threaded=" << (end - ini) / CLOCKS_PER_SEC << endl;
cout<<vec2[2][2]<<" "<<vec3[2][2]<<endl;//to make the sw to do de inverse
cout << endl;
}
int main()
{
test_6_inverse(1000);
cout << endl << "=== END ===" << endl; getchar();
return 1;
}
After looking deeper in the code of the colum_zero() function I have seen that one thread rewrites in the data to be used by another threads, so the threads are not INDEPENDENT from each other. Fortunately the compiler detect it and avoid it.
Conclusions:
It is not recommended to try Gauss-Jordan method alone to make multithreads
If somebody detects that in multithread is slower and the initial function is spreaded correctly for every thread, perhaps is due one thread results are used by another
The main function inverse() works and can be used by other programmers, so this question should not be deleted
Non answered question:
What is a matrix inverse method that could be spreaded in a lot of independent threads to be used in a gpu?

Gradient descent converging towards the wrong value

I'm trying to implement a gradient descent algorithm in C++. Here's the code I have so far :
#include <iostream>
double X[] {163,169,158,158,161,172,156,161,154,145};
double Y[] {52, 68, 49, 73, 71, 99, 50, 82, 56, 46 };
double m, p;
int n = sizeof(X)/sizeof(X[0]);
int main(void) {
double alpha = 0.00004; // 0.00007;
m = (Y[1] - Y[0]) / (X[1] - X[0]);
p = Y[0] - m * X[0];
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
}
return 0;
}
double Loss_function(void) {
double res = 0;
double tmp;
for (int i = 0; i < n; i++) {
tmp = Y[i] - m * X[i] - p;
res += tmp * tmp;
}
return res / 2.0 / (double)n;
}
void gradientStep(double alpha) {
double pg = 0, mg = 0;
for (int i = 0; i < n; i++) {
pg += Y[i] - m * X[i] - p;
mg += X[i] * (Y[i] - m * X[i] - p);
}
p += alpha * pg / n;
m += alpha * mg / n;
}
This code converges towards m = 2.79822, p = -382.666, and an error of 102.88. But if I use my calculator to find out the correct linear regression model, I find that the correct values of m and p should respectively be 1.601 and -191.1.
I also noticed that the algorithm won't converge for alpha > 0.00007, which seems quite low, and the value of p barely changes during the 8 iterations (or even after 2000 iterations).
What's wrong with my code?
Here's a good overview of the algorithm I'm trying to implement. The values of theta0 and theta1 are called p and m in my program.
Other implementation in python
More about the algorithm
This link gives a comprehensive view of the algorithm; it turns out I was following a completely wrong approach.
The following code does not work properly (and I have no plans to work on it further), but should put on track anyone who's confronted to the same problem as me :
#include <vector>
#include <iostream>
typedef std::vector<double> vect;
std::vector<double> y, omega(2, 0), omega2(2, 0);;
std::vector<std::vector<double>> X;
int n = 10;
int main(void) {
/* Initialize x so that each members contains (1, x_i) */
/* Initialize x so that each members contains y_i */
double alpha = 0.00001;
display();
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
display();
}
return 0;
}
double f_function(const std::vector<double> &x) {
double c;
for (unsigned int i = 0; i < omega.size(); i++) {
c += omega[i] * x[i];
}
return c;
}
void gradientStep(double alpha) {
for (int i = 0; i < n; i++) {
for (unsigned int j = 0; j < X[0].size(); j++) {
omega2[j] -= alpha/(double)n * (f_function(X[i]) - y[i]) * X[i][j];
}
}
omega = omega2;
}
void display(void) {
double res = 0, tmp = 0;
for (int i = 0; i < n; i++) {
tmp = y[i] - f_function(X[i]);
res += tmp * tmp; // Loss functionn
}
std::cout << "omega = ";
for (unsigned int i = 0; i < omega.size(); i++) {
std::cout << "[" << omega[i] << "] ";
}
std::cout << "\tError : " << res * .5/(double)n << std::endl;
}

C++ Spline interpolation from an array of points

I am writing a bit of code to animate a point using a sequence of positions. In order to have a decent result, I'd like to add some spline interpolation
to smoothen the transitions between positions. All the positions are separated by the same amount of time (let's say 500ms).
int delay = 500;
vector<Point> positions={ (0, 0) , (50, 20), (150, 100), (30, 120) };
Here is what i have done to make a linear interpolation (which seems to work properly), juste to give you an idea of what I'm looking for later on :
Point getPositionAt(int currentTime){
Point before, after, result;
int currentIndex = (currentTime / delay) % positions.size();
before = positions[currentIndex];
after = positions[(currentIndex + 1) % positions.size()];
// progress between [before] and [after]
double progress = fmod((((double)currentTime) / (double)delay), (double)positions.size()) - currentIndex;
result.x = before.x + (int)progress*(after.x - before.x);
result.y = before.y + (int)progress*(after.y - before.y);
return result;
}
So that was simple, but now what I would like to do is spline interpolation. Thanks !
I had to write a Bezier spline creation routine for an "entity" that was following a path in a game I am working on. I created a base class to handle a "SplineInterface" and the created two derived classes, one based on the classic spline technique (e.g. Sedgewick/Algorithms) an a second one based on Bezier Splines.
Here is the code. It is a single header file, with a few includes (most should be obvious):
#ifndef __SplineCommon__
#define __SplineCommon__
#include "CommonSTL.h"
#include "CommonProject.h"
#include "MathUtilities.h"
/* A Spline base class. */
class SplineBase
{
private:
vector<Vec2> _points;
bool _elimColinearPoints;
protected:
protected:
/* OVERRIDE THESE FUNCTIONS */
virtual void ResetDerived() = 0;
enum
{
NOM_SIZE = 32,
};
public:
SplineBase()
{
_points.reserve(NOM_SIZE);
_elimColinearPoints = true;
}
const vector<Vec2>& GetPoints() { return _points; }
bool GetElimColinearPoints() { return _elimColinearPoints; }
void SetElimColinearPoints(bool elim) { _elimColinearPoints = elim; }
/* OVERRIDE THESE FUNCTIONS */
virtual Vec2 Eval(int seg, double t) = 0;
virtual bool ComputeSpline() = 0;
virtual void DumpDerived() {}
/* Clear out all the data.
*/
void Reset()
{
_points.clear();
ResetDerived();
}
void AddPoint(const Vec2& pt)
{
// If this new point is colinear with the two previous points,
// pop off the last point and add this one instead.
if(_elimColinearPoints && _points.size() > 2)
{
int N = _points.size()-1;
Vec2 p0 = _points[N-1] - _points[N-2];
Vec2 p1 = _points[N] - _points[N-1];
Vec2 p2 = pt - _points[N];
// We test for colinearity by comparing the slopes
// of the two lines. If the slopes are the same,
// we assume colinearity.
float32 delta = (p2.y-p1.y)*(p1.x-p0.x)-(p1.y-p0.y)*(p2.x-p1.x);
if(MathUtilities::IsNearZero(delta))
{
_points.pop_back();
}
}
_points.push_back(pt);
}
void Dump(int segments = 5)
{
assert(segments > 1);
cout << "Original Points (" << _points.size() << ")" << endl;
cout << "-----------------------------" << endl;
for(int idx = 0; idx < _points.size(); ++idx)
{
cout << "[" << idx << "]" << " " << _points[idx] << endl;
}
cout << "-----------------------------" << endl;
DumpDerived();
cout << "-----------------------------" << endl;
cout << "Evaluating Spline at " << segments << " points." << endl;
for(int idx = 0; idx < _points.size()-1; idx++)
{
cout << "---------- " << "From " << _points[idx] << " to " << _points[idx+1] << "." << endl;
for(int tIdx = 0; tIdx < segments+1; ++tIdx)
{
double t = tIdx*1.0/segments;
cout << "[" << tIdx << "]" << " ";
cout << "[" << t*100 << "%]" << " ";
cout << " --> " << Eval(idx,t);
cout << endl;
}
}
}
};
class ClassicSpline : public SplineBase
{
private:
/* The system of linear equations found by solving
* for the 3 order spline polynomial is given by:
* A*x = b. The "x" is represented by _xCol and the
* "b" is represented by _bCol in the code.
*
* The "A" is formulated with diagonal elements (_diagElems) and
* symmetric off-diagonal elements (_offDiagElemns). The
* general structure (for six points) looks like:
*
*
* | d1 u1 0 0 0 | | p1 | | w1 |
* | u1 d2 u2 0 0 | | p2 | | w2 |
* | 0 u2 d3 u3 0 | * | p3 | = | w3 |
* | 0 0 u3 d4 u4 | | p4 | | w4 |
* | 0 0 0 u4 d5 | | p5 | | w5 |
*
*
* The general derivation for this can be found
* in Robert Sedgewick's "Algorithms in C++".
*
*/
vector<double> _xCol;
vector<double> _bCol;
vector<double> _diagElems;
vector<double> _offDiagElems;
public:
ClassicSpline()
{
_xCol.reserve(NOM_SIZE);
_bCol.reserve(NOM_SIZE);
_diagElems.reserve(NOM_SIZE);
_offDiagElems.reserve(NOM_SIZE);
}
/* Evaluate the spline for the ith segment
* for parameter. The value of parameter t must
* be between 0 and 1.
*/
inline virtual Vec2 Eval(int seg, double t)
{
const vector<Vec2>& points = GetPoints();
assert(t >= 0);
assert(t <= 1.0);
assert(seg >= 0);
assert(seg < (points.size()-1));
const double ONE_OVER_SIX = 1.0/6.0;
double oneMinust = 1.0 - t;
double t3Minust = t*t*t-t;
double oneMinust3minust = oneMinust*oneMinust*oneMinust-oneMinust;
double deltaX = points[seg+1].x - points[seg].x;
double yValue = t * points[seg + 1].y +
oneMinust*points[seg].y +
ONE_OVER_SIX*deltaX*deltaX*(t3Minust*_xCol[seg+1] - oneMinust3minust*_xCol[seg]);
double xValue = t*(points[seg+1].x-points[seg].x) + points[seg].x;
return Vec2(xValue,yValue);
}
/* Clear out all the data.
*/
virtual void ResetDerived()
{
_diagElems.clear();
_bCol.clear();
_xCol.clear();
_offDiagElems.clear();
}
virtual bool ComputeSpline()
{
const vector<Vec2>& p = GetPoints();
_bCol.resize(p.size());
_xCol.resize(p.size());
_diagElems.resize(p.size());
for(int idx = 1; idx < p.size(); ++idx)
{
_diagElems[idx] = 2*(p[idx+1].x-p[idx-1].x);
}
for(int idx = 0; idx < p.size(); ++idx)
{
_offDiagElems[idx] = p[idx+1].x - p[idx].x;
}
for(int idx = 1; idx < p.size(); ++idx)
{
_bCol[idx] = 6.0*((p[idx+1].y-p[idx].y)/_offDiagElems[idx] -
(p[idx].y-p[idx-1].y)/_offDiagElems[idx-1]);
}
_xCol[0] = 0.0;
_xCol[p.size()-1] = 0.0;
for(int idx = 1; idx < p.size()-1; ++idx)
{
_bCol[idx+1] = _bCol[idx+1] - _bCol[idx]*_offDiagElems[idx]/_diagElems[idx];
_diagElems[idx+1] = _diagElems[idx+1] - _offDiagElems[idx]*_offDiagElems[idx]/_diagElems[idx];
}
for(int idx = (int)p.size()-2; idx > 0; --idx)
{
_xCol[idx] = (_bCol[idx] - _offDiagElems[idx]*_xCol[idx+1])/_diagElems[idx];
}
return true;
}
};
/* Bezier Spline Implementation
* Based on this article:
* http://www.particleincell.com/blog/2012/bezier-splines/
*/
class BezierSpine : public SplineBase
{
private:
vector<Vec2> _p1Points;
vector<Vec2> _p2Points;
public:
BezierSpine()
{
_p1Points.reserve(NOM_SIZE);
_p2Points.reserve(NOM_SIZE);
}
/* Evaluate the spline for the ith segment
* for parameter. The value of parameter t must
* be between 0 and 1.
*/
inline virtual Vec2 Eval(int seg, double t)
{
assert(seg < _p1Points.size());
assert(seg < _p2Points.size());
double omt = 1.0 - t;
Vec2 p0 = GetPoints()[seg];
Vec2 p1 = _p1Points[seg];
Vec2 p2 = _p2Points[seg];
Vec2 p3 = GetPoints()[seg+1];
double xVal = omt*omt*omt*p0.x + 3*omt*omt*t*p1.x +3*omt*t*t*p2.x+t*t*t*p3.x;
double yVal = omt*omt*omt*p0.y + 3*omt*omt*t*p1.y +3*omt*t*t*p2.y+t*t*t*p3.y;
return Vec2(xVal,yVal);
}
/* Clear out all the data.
*/
virtual void ResetDerived()
{
_p1Points.clear();
_p2Points.clear();
}
virtual bool ComputeSpline()
{
const vector<Vec2>& p = GetPoints();
int N = (int)p.size()-1;
_p1Points.resize(N);
_p2Points.resize(N);
if(N == 0)
return false;
if(N == 1)
{ // Only 2 points...just create a straight line.
// Constraint: 3*P1 = 2*P0 + P3
_p1Points[0] = (2.0/3.0*p[0] + 1.0/3.0*p[1]);
// Constraint: P2 = 2*P1 - P0
_p2Points[0] = 2.0*_p1Points[0] - p[0];
return true;
}
/*rhs vector*/
vector<Vec2> a(N);
vector<Vec2> b(N);
vector<Vec2> c(N);
vector<Vec2> r(N);
/*left most segment*/
a[0].x = 0;
b[0].x = 2;
c[0].x = 1;
r[0].x = p[0].x+2*p[1].x;
a[0].y = 0;
b[0].y = 2;
c[0].y = 1;
r[0].y = p[0].y+2*p[1].y;
/*internal segments*/
for (int i = 1; i < N - 1; i++)
{
a[i].x=1;
b[i].x=4;
c[i].x=1;
r[i].x = 4 * p[i].x + 2 * p[i+1].x;
a[i].y=1;
b[i].y=4;
c[i].y=1;
r[i].y = 4 * p[i].y + 2 * p[i+1].y;
}
/*right segment*/
a[N-1].x = 2;
b[N-1].x = 7;
c[N-1].x = 0;
r[N-1].x = 8*p[N-1].x+p[N].x;
a[N-1].y = 2;
b[N-1].y = 7;
c[N-1].y = 0;
r[N-1].y = 8*p[N-1].y+p[N].y;
/*solves Ax=b with the Thomas algorithm (from Wikipedia)*/
for (int i = 1; i < N; i++)
{
double m;
m = a[i].x/b[i-1].x;
b[i].x = b[i].x - m * c[i - 1].x;
r[i].x = r[i].x - m * r[i-1].x;
m = a[i].y/b[i-1].y;
b[i].y = b[i].y - m * c[i - 1].y;
r[i].y = r[i].y - m * r[i-1].y;
}
_p1Points[N-1].x = r[N-1].x/b[N-1].x;
_p1Points[N-1].y = r[N-1].y/b[N-1].y;
for (int i = N - 2; i >= 0; --i)
{
_p1Points[i].x = (r[i].x - c[i].x * _p1Points[i+1].x) / b[i].x;
_p1Points[i].y = (r[i].y - c[i].y * _p1Points[i+1].y) / b[i].y;
}
/*we have p1, now compute p2*/
for (int i=0;i<N-1;i++)
{
_p2Points[i].x=2*p[i+1].x-_p1Points[i+1].x;
_p2Points[i].y=2*p[i+1].y-_p1Points[i+1].y;
}
_p2Points[N-1].x = 0.5 * (p[N].x+_p1Points[N-1].x);
_p2Points[N-1].y = 0.5 * (p[N].y+_p1Points[N-1].y);
return true;
}
virtual void DumpDerived()
{
cout << " Control Points " << endl;
for(int idx = 0; idx < _p1Points.size(); idx++)
{
cout << "[" << idx << "] ";
cout << "P1: " << _p1Points[idx];
cout << " ";
cout << "P2: " << _p2Points[idx];
cout << endl;
}
}
};
#endif /* defined(__SplineCommon__) */
Some Notes
The classic spline will crash if you give it a vertical set of
points. That is why I created the Bezier...I have lots of vertical
lines/paths to follow.
The base class has an option to remove colinear points as you add
them. This uses a simple slope comparison of two lines to figure out
if they are on the same line. You don't have to do this, but for
long paths that are straight lines, it cuts down on cycles. When you
do a lot of pathfinding on a regular-spaced graph, you tend to get a
lot of continuous segments.
Here is an example of using the Bezier Spline:
/* Smooth the points on the path so that turns look
* more natural. We'll only smooth the first few
* points. Most of the time, the full path will not
* be executed anyway...why waste cycles.
*/
void SmoothPath(vector<Vec2>& path, int32 divisions)
{
const int SMOOTH_POINTS = 6;
BezierSpine spline;
if(path.size() < 2)
return;
// Cache off the first point. If the first point is removed,
// the we occasionally run into problems if the collision detection
// says the first node is occupied but the splined point is too
// close, so the FSM "spins" trying to find a sensor cell that is
// not occupied.
// Vec2 firstPoint = path.back();
// path.pop_back();
// Grab the points.
for(int idx = 0; idx < SMOOTH_POINTS && path.size() > 0; idx++)
{
spline.AddPoint(path.back());
path.pop_back();
}
// Smooth them.
spline.ComputeSpline();
// Push them back in.
for(int idx = spline.GetPoints().size()-2; idx >= 0; --idx)
{
for(int division = divisions-1; division >= 0; --division)
{
double t = division*1.0/divisions;
path.push_back(spline.Eval(idx, t));
}
}
// Push back in the original first point.
// path.push_back(firstPoint);
}
Notes
While the whole path could be smoothed, in this application, since
the path was changing every so often, it was better to just smooth
the first points and then connect it up.
The points are loaded in "reverse" order into the path vector. This
may or may not save cycles (I've slept since then).
This code is part of a much larger code base, but you can download it all on github and see a blog entry about it here.
You can look at this in action in this video.
Was this helpful?