Gradient descent converging towards the wrong value

Gradient descent converging towards the wrong value - c++

I'm trying to implement a gradient descent algorithm in C++. Here's the code I have so far :
#include <iostream>
double X[] {163,169,158,158,161,172,156,161,154,145};
double Y[] {52, 68, 49, 73, 71, 99, 50, 82, 56, 46 };
double m, p;
int n = sizeof(X)/sizeof(X[0]);
int main(void) {
double alpha = 0.00004; // 0.00007;
m = (Y[1] - Y[0]) / (X[1] - X[0]);
p = Y[0] - m * X[0];
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
}
return 0;
}
double Loss_function(void) {
double res = 0;
double tmp;
for (int i = 0; i < n; i++) {
tmp = Y[i] - m * X[i] - p;
res += tmp * tmp;
}
return res / 2.0 / (double)n;
}
void gradientStep(double alpha) {
double pg = 0, mg = 0;
for (int i = 0; i < n; i++) {
pg += Y[i] - m * X[i] - p;
mg += X[i] * (Y[i] - m * X[i] - p);
}
p += alpha * pg / n;
m += alpha * mg / n;
}
This code converges towards m = 2.79822, p = -382.666, and an error of 102.88. But if I use my calculator to find out the correct linear regression model, I find that the correct values of m and p should respectively be 1.601 and -191.1.
I also noticed that the algorithm won't converge for alpha > 0.00007, which seems quite low, and the value of p barely changes during the 8 iterations (or even after 2000 iterations).
What's wrong with my code?
Here's a good overview of the algorithm I'm trying to implement. The values of theta0 and theta1 are called p and m in my program.
Other implementation in python
More about the algorithm

This link gives a comprehensive view of the algorithm; it turns out I was following a completely wrong approach.
The following code does not work properly (and I have no plans to work on it further), but should put on track anyone who's confronted to the same problem as me :
#include <vector>
#include <iostream>
typedef std::vector<double> vect;
std::vector<double> y, omega(2, 0), omega2(2, 0);;
std::vector<std::vector<double>> X;
int n = 10;
int main(void) {
/* Initialize x so that each members contains (1, x_i) */
/* Initialize x so that each members contains y_i */
double alpha = 0.00001;
display();
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
display();
}
return 0;
}
double f_function(const std::vector<double> &x) {
double c;
for (unsigned int i = 0; i < omega.size(); i++) {
c += omega[i] * x[i];
}
return c;
}
void gradientStep(double alpha) {
for (int i = 0; i < n; i++) {
for (unsigned int j = 0; j < X[0].size(); j++) {
omega2[j] -= alpha/(double)n * (f_function(X[i]) - y[i]) * X[i][j];
}
}
omega = omega2;
}
void display(void) {
double res = 0, tmp = 0;
for (int i = 0; i < n; i++) {
tmp = y[i] - f_function(X[i]);
res += tmp * tmp; // Loss functionn
}
std::cout << "omega = ";
for (unsigned int i = 0; i < omega.size(); i++) {
std::cout << "[" << omega[i] << "] ";
}
std::cout << "\tError : " << res * .5/(double)n << std::endl;
}

Related

Fixing Neural Net vanishing gradients problem?

This is going to be a long one. I am still very new to coding, started 3 months ago so I know my code is not perfect, any criticism beyond the question is more than welcome. I have specifically avoided using pointers because I do not fully understand them, I can use them but I dont trust that I will use them correctly in a program like this.
First things first, I have a version of this where there is only 1 hidden layer and the net works perfectly. I have started running into problems since I tried to expand the number of hidden layers.
Some info on the net:
-I am using softmax output activation as I have 3 output neurons.
-I am using tanh as my activation function on the rest of the net.
-The file being read for the input has a format of
"input: 0.56 0.76 0.23 0.67"
"output: 0.0 0.0 1.0" (this is the target)
-The weights for connecting layer 1 neuron to layer 2 neuron are stored in layer 1 one neuron.
-The bias's for each neuron are stored in that neuron.
-The target is 1.0 0.0 0.0 if the sum of the input numbers is below one, 0.0 1.0 0.0 if sum is between 1 and 2, 0.0 0.0 1.0 if sum is above 2.
-using L1 regularization.
Those problems specifically being:
The softmax output values do not move from an relatively equalised range ie:
(position 1 and 2 in the target vector have a roughly 50/50 occurance rate while position 3 less than 3% occurance rate. so by relatively equalised I mean the softmax output generally looks something like
"0.56.... 0.48.... 0.02..." even after 500 epochs.
The weights at the hidden layer closer to inputlayer dont change much at all, which is what i think vanishing gradients are. I might be wrong on this. But the weights at hiddenlayer closest to output are ending up at between -50 & 50 (which i think is okay?)
Things I have tried:
I have tried using Relu, parametric Relu, exponential Relu, but with all of these the softmax output value for neuron 3 keeps rising, the other 2 neurons values keep falling. these values continue their trajectory until either 500 epochs have been reached or they just turn into nans. (I think this is to do with the structure of my code rather than the Relu function itself).
If I set the number of hidden layers above 3 while using relu, it immediately spits out nans, within the first epoch.
The backprop function is pretty long, but this is specifically because I have deconstructed it many times over to try and figure out where I might be mismatching values or something. I do have it in a condensed version but I feel I have a higher chance of being completely off the mark there than I do if I have it deconstructed.
I have included the Relu function code that I used, it is the first time I use it so I might be wrong on that aswell but I dont think so, I have double checked multiple times. The Relu in the code is specifically "Elu" or exponential relu.
here is the code for the net:
#include <iostream>
#include <fstream>
#include <cmath>
#include <vector>
#include <sstream>
#include <random>
#include <string>
#include <iomanip>
double randomt(double x, double y)
{
std::random_device rd;
std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(x, y);
return dist(mt);
}
class InputN
{
public:
double val{};
std::vector <double> weights{};
};
class HiddenN
{
public:
double preactval{};
double actval{};
double actvalPD{};
double preactvalpd{};
std::vector <double> weights{};
double bias{};
};
class OutputN
{
public:
double preactval{};
double actval{};
double preactvalpd{};
double bias{};
};
class Net
{
public:
std::vector <InputN> inneurons{};
std::vector <std::vector <HiddenN>> hiddenneurons{};
std::vector <OutputN> outputneurons{};
double lambda{ 0.015 };
double alpha{ 0.02 };
};
double tanhderiv(double val)
{
return 1 - tanh(val) * tanh(val);
}
double Relu(double val)
{
if (val < 0) return 0.01 *(exp(val) - 1);
else return val;
}
double Reluderiv(double val)
{
if (val < 0) return Relu(val) + 0.01;
else return 1;
}
double regularizer(double weight)
{
double absval{};
if (weight < 0) absval = weight - weight - weight;
else if (weight > 0 || weight == 0) absval = weight;
else;
if (absval > 0) return 1;
else if (absval < 0) return -1;
else if (absval == 0) return 0;
else return 2;
}
void feedforward(Net& net)
{
double sum{};
int prevlayer{};
for (size_t Hsize = 0; Hsize < net.hiddenneurons.size(); Hsize++)
{
//std::cout << "in first loop" << '\n';
prevlayer = Hsize - 1;
for (size_t Hel = 0; Hel < net.hiddenneurons[Hsize].size(); Hel++)
{
//std::cout << "in second loop" << '\n';
if (Hsize == 0)
{
//std::cout << "in first if" << '\n';
for (size_t Isize = 0; Isize < net.inneurons.size(); Isize++)
{
//std::cout << "in fourth loop" << '\n';
sum += (net.inneurons[Isize].val * net.inneurons[Isize].weights[Hel]);
}
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
sum = 0;
//std::cout << "first if done" << '\n';
}
else
{
//std::cout << "in else" << '\n';
for (size_t prs = 0; prs < net.hiddenneurons[prevlayer].size(); prs++)
{
//std::cout << "in fourth loop" << '\n';
sum += net.hiddenneurons[prevlayer][prs].actval * net.hiddenneurons[prevlayer][prs].weights[Hel];
}
//std::cout << "fourth loop done" << '\n';
net.hiddenneurons[Hsize][Hel].preactval = net.hiddenneurons[Hsize][Hel].bias + sum;
net.hiddenneurons[Hsize][Hel].actval = tanh(sum);
//std::cout << "else done" << '\n';
sum = 0;
}
}
}
//std::cout << "first loop done " << '\n';
int lasthid = net.hiddenneurons.size() - 1;
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
for (size_t Hsize = 0; Hsize < net.hiddenneurons[lasthid].size(); Hsize++)
{
sum += (net.hiddenneurons[lasthid][Hsize].actval * net.hiddenneurons[lasthid][Hsize].weights[Osize]);
}
net.outputneurons[Osize].preactval = net.outputneurons[Osize].bias + sum;
}
}
void softmax(Net& net)
{
double sum{};
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
sum += exp(net.outputneurons[Osize].preactval);
}
for (size_t Osize = 0; Osize < net.outputneurons.size(); Osize++)
{
net.outputneurons[Osize].actval = exp(net.outputneurons[Osize].preactval) / sum;
}
}
void lossfunc(Net& net, std::vector <double> target)
{
int pos{ -1 };
double val{};
for (size_t t = 0; t < target.size(); t++)
{
pos += 1;
if (target[t] > 0)
{
break;
}
}
for (size_t s = 0; net.outputneurons.size(); s++)
{
val = -log(net.outputneurons[pos].actval);
}
}
void backprop(Net& net, std::vector<double>& target)
{
for (size_t outI = 0; outI < net.outputneurons.size(); outI++)
{
double PD = target[outI] - net.outputneurons[outI].actval;
net.outputneurons[outI].preactvalpd = PD * -1;
}
size_t lasthid = net.hiddenneurons.size() - 1;
for (size_t LH = 0; LH < net.hiddenneurons[lasthid].size(); LH++)
{
for (size_t LHW = 0; LHW < net.hiddenneurons[lasthid][LH].weights.size(); LHW++)
{
double weight = net.hiddenneurons[lasthid][LH].weights[LHW];
double PD = net.outputneurons[LHW].preactvalpd * net.hiddenneurons[lasthid][LH].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[lasthid][LH].weights[LHW] = weight;
}
}
for (size_t OB = 0; OB < net.outputneurons.size(); OB++)
{
double bias = net.outputneurons[OB].bias;
double BPD = net.outputneurons[OB].preactvalpd;
BPD = BPD * -1;
double Delta = BPD;
bias = bias + (net.alpha * Delta);
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
double PD{};
for (size_t HW = 0; HW < net.outputneurons.size(); HW++)
{
PD += net.hiddenneurons[lasthid][HPD].weights[HW] * net.outputneurons[HW].preactvalpd;
}
net.hiddenneurons[lasthid][HPD].actvalPD = PD;
PD = 0;
}
for (size_t HPD = 0; HPD < net.hiddenneurons[lasthid].size(); HPD++)
{
net.hiddenneurons[lasthid][HPD].preactvalpd = net.hiddenneurons[lasthid][HPD].actvalPD * tanhderiv(net.hiddenneurons[lasthid][HPD].preactval);
}
for (size_t AllHid = net.hiddenneurons.size() - 2; AllHid > -1; AllHid--)
{
size_t uplayer = AllHid + 1;
for (size_t cl = 0; cl < net.hiddenneurons[AllHid].size(); cl++)
{
for (size_t clw = 0; clw < net.hiddenneurons[AllHid][cl].weights.size(); clw++)
{
double weight = net.hiddenneurons[AllHid][cl].weights[clw];
double PD = net.hiddenneurons[uplayer][clw].preactvalpd * net.hiddenneurons[AllHid][cl].actval;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.hiddenneurons[AllHid][cl].weights[clw] = weight;
}
}
for (size_t up = 0; up < net.hiddenneurons[uplayer].size(); up++)
{
double bias = net.hiddenneurons[uplayer][up].bias;
double PD = net.hiddenneurons[uplayer][up].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
}
for (size_t APD = 0; APD < net.hiddenneurons[AllHid].size(); APD++)
{
double PD{};
for (size_t APDW = 0; APDW < net.hiddenneurons[AllHid][APD].weights.size(); APDW++)
{
PD += net.hiddenneurons[AllHid][APD].weights[APDW] * net.hiddenneurons[uplayer][APDW].preactvalpd;
}
net.hiddenneurons[AllHid][APD].actvalPD = PD;
PD = 0;
}
for (size_t PPD = 0; PPD < net.hiddenneurons[AllHid].size(); PPD++)
{
double PD = net.hiddenneurons[AllHid][PPD].actvalPD * tanhderiv(net.hiddenneurons[AllHid][PPD].preactval);
net.hiddenneurons[AllHid][PPD].preactvalpd = PD;
}
}
for (size_t IN = 0; IN < net.inneurons.size(); IN++)
{
for (size_t INW = 0; INW < net.inneurons[IN].weights.size(); INW++)
{
double weight = net.inneurons[IN].weights[INW];
double PD = net.hiddenneurons[0][INW].preactvalpd * net.inneurons[IN].val;
PD = PD * -1;
double delta = PD - (net.lambda * regularizer(weight));
weight = weight + (net.alpha * delta);
net.inneurons[IN].weights[INW] = weight;
}
}
for (size_t hidB = 0; hidB < net.hiddenneurons[0].size(); hidB++)
{
double bias = net.hiddenneurons[0][hidB].bias;
double PD = net.hiddenneurons[0][hidB].preactvalpd;
PD = PD * -1;
double delta = PD;
bias = bias + (net.alpha * delta);
net.hiddenneurons[0][hidB].bias = bias;
}
}
int main()
{
std::vector <double> invals{ };
std::vector <double> target{ };
Net net;
InputN Ineuron;
HiddenN Hneuron;
OutputN Oneuron;
int IN = 4;
int HIDLAYERS = 4;
int HID = 8;
int OUT = 3;
for (int i = 0; i < IN; i++)
{
net.inneurons.push_back(Ineuron);
for (int m = 0; m < HID; m++)
{
net.inneurons.back().weights.push_back(randomt(0.0, 0.5));
}
}
//std::cout << "first loop done" << '\n';
for (int s = 0; s < HIDLAYERS; s++)
{
net.hiddenneurons.push_back(std::vector <HiddenN>());
if (s == HIDLAYERS - 1)
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < OUT; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
else
{
for (int i = 0; i < HID; i++)
{
net.hiddenneurons[s].push_back(Hneuron);
for (int m = 0; m < HID; m++)
{
net.hiddenneurons[s].back().weights.push_back(randomt(0.0, 0.5));
}
net.hiddenneurons[s].back().bias = 1.0;
}
}
}
//std::cout << "second loop done" << '\n';
for (int i = 0; i < OUT; i++)
{
net.outputneurons.push_back(Oneuron);
net.outputneurons.back().bias = randomt(0.0, 0.5);
}
//std::cout << "third loop done" << '\n';
int count{};
std::ifstream fileread("N.txt");
for (int epoch = 0; epoch < 500; epoch++)
{
count = 0;
if (epoch == 100 || epoch == 100 * 2 || epoch == 100 * 3 || epoch == 100 * 4 || epoch == 499)
{
printvals("no", net);
}
fileread.clear(); fileread.seekg(0, std::ios::beg);
while (fileread.is_open())
{
std::cout << '\n' << "epoch: " << epoch << '\n';
std::string fileline{};
fileread >> fileline;
if (fileline == "in:")
{
std::string input{};
double nums{};
std::getline(fileread, input);
std::stringstream ss(input);
while (ss >> nums)
{
invals.push_back(nums);
}
}
if (fileline == "out:")
{
std::string output{};
double num{};
std::getline(fileread, output);
std::stringstream ss(output);
while (ss >> num)
{
target.push_back(num);
}
}
count += 1;
if (count == 2)
{
for (size_t inv = 0; inv < invals.size(); inv++)
{
net.inneurons[inv].val = invals[inv];
}
//std::cout << "calling feedforward" << '\n';
feedforward(net);
//std::cout << "ff done" << '\n';
softmax(net);
printvals("output", net);
std::cout << "target: " << '\n';
for (auto element : target) std::cout << element << " / ";
std::cout << '\n';
backprop(net, target);
invals.clear();
target.clear();
count = 0;
}
if (fileread.eof()) break;
}
}
//std::cout << "fourth loop done" << '\n';
return 1;
}
Much aprecciated to anyone who actually made it through all that! :)

C++ neural network implemented from scratch cannot get above 50% on MNIST

So I have implemented a fully connected one hidden layer neural network in C++ using Eigen for matrix multiplication. It uses minibatch gradient descent.
However, my model cannot get above 50% accuracy on mnist. I have tried learning rates from between 0.0001 and 10. The model does overfit on training sizes < 100 (with ~90% accuracy which is still pretty bad), albeit extremely slowly.
What might be causing this low accuracy and extremely slow learning? My main concern is that the backpropagation is incorrect. Furthermore, I would prefer not to add any other optimization techniques (learning rate schedule, regularization, etc.).
Feed forward and backprop code:
z1 = (w1 * mbX).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
MatrixXd err = ((double) epsilon)/((double) minibatch_size) * ((a2 - mbY).array() * sigmoid_derivative(z2).array()).matrix();
b2 = b2 - err * ones;
w2 = w2 - (err * a1.transpose());
err = ((w2.transpose() * err).array() * sigmoid_derivative(z1).array()).matrix();
b1 = b1 - err * ones;
w1 = w1 - (err * mbX.transpose());
Full program code:
#include <iostream>
#include <fstream>
#include <math.h>
#include <cstdlib>
#include <Eigen/Dense>
#include <vector>
#include <string>
using namespace Eigen;
#define N 30
#define epsilon 0.7
#define epoch 1000
//sizes
const int minibatch_size = 10;
const int training_size = 10000;
const int val_size = 10;
unsigned int num, magic, rows, cols;
//images
unsigned int image[training_size][28][28];
unsigned int val_image[val_size][28][28];
//labels
unsigned int label[training_size];
unsigned int val_label[val_size];
//inputs
MatrixXd X(784, training_size);
MatrixXd Y = MatrixXd::Zero(10, training_size);
//minibatch
MatrixXd mbX(784, minibatch_size);
MatrixXd mbY = MatrixXd::Zero(10, minibatch_size);
//validation
MatrixXd Xv(784, val_size);
MatrixXd Yv = MatrixXd::Zero(10, val_size);
//Image processing courtesy of https://stackoverflow.com/users/11146076/%e5%bc%a0%e4%ba%91%e9%93%ad
unsigned int in(std::ifstream& icin, unsigned int size) {
unsigned int ans = 0;
for (int i = 0; i < size; i++) {
unsigned char x;
icin.read((char*)&x, 1);
unsigned int temp = x;
ans <<= 8;
ans += temp;
}
return ans;
}
void input(std::string ipath, std::string lpath, std::string ipath2, std::string lpath2) {
std::ifstream icin;
//training data
icin.open(ipath, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4), rows = in(icin, 4), cols = in(icin, 4);
for (int i = 0; i < training_size; i++) {
int val = 0;
for (int x = 0; x < rows; x++) {
for (int y = 0; y < cols; y++) {
image[i][x][y] = in(icin, 1);
X(val, i) = image[i][x][y]/255;
val++;
}
}
}
icin.close();
//training labels
icin.open(lpath, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4);
for (int i = 0; i < training_size; i++) {
label[i] = in(icin, 1);
Y(label[i], i) = 1;
}
icin.close();
//validation data
icin.open(ipath2, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4), rows = in(icin, 4), cols = in(icin, 4);
for (int i = 0; i < val_size; i++) {
int val = 0;
for (int x = 0; x < rows; x++) {
for (int y = 0; y < cols; y++) {
val_image[i][x][y] = in(icin, 1);
Xv(val, i) = val_image[i][x][y]/255;
val++;
}
}
}
icin.close();
//validation labels
icin.open(lpath2, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4);
for (int i = 0; i < val_size; i++) {
val_label[i] = in(icin, 1);
Yv(val_label[i], i) = 1;
}
icin.close();
}
//Neural Network calculations
MatrixXd sigmoid(MatrixXd m) {
m *= -1;
return (1/(1 + m.array().exp())).matrix();
}
MatrixXd sigmoid_derivative(MatrixXd m) {
return (sigmoid(m).array() * (1 - sigmoid(m).array())).matrix();
}
//Initialize weights and biases
//hidden layer
VectorXd b1 = MatrixXd::Zero(N, 1);
MatrixXd w1 = MatrixXd::Random(N, 784);
//output
VectorXd b2 = MatrixXd::Zero(10, 1);
MatrixXd w2 = MatrixXd::Random(10, N);
//Initialize intermediate values
MatrixXd z1, z2, a1, a2, z1v, z2v, a1v, a2v;
MatrixXd ones = MatrixXd::Constant(minibatch_size, 1, 1);
int main() {
input("C:\\Users\\Aaron\\Documents\\Test\\train-images-idx3-ubyte\\train-images.idx3-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\train-labels-idx1-ubyte\\train-labels.idx1-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\t10k-images-idx3-ubyte\\t10k-images.idx3-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\t10k-labels-idx1-ubyte\\t10k-labels.idx1-ubyte");
std::cout << "Finished Image Processing" << std::endl;
//std::cout << w1 << std::endl;
std::vector<double> val_ac;
std::vector<double> c;
std::vector<int> order;
for (int i = 0; i < training_size; i++) {
order.push_back(i);
}
for (int i = 0; i < epoch; i++) {
//feed forward
std::random_shuffle(order.begin(), order.end());
for (int j = 0; j < training_size/minibatch_size; j++) {
for (int k = 0; k < minibatch_size; k++) {
int index = order[j * minibatch_size + k];
mbX.col(k) = X.col(index);
mbY.col(k) = Y.col(index);
}
z1 = (w1 * mbX).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
MatrixXd err = ((double) epsilon)/((double) minibatch_size) * ((a2 - mbY).array() * sigmoid_derivative(z2).array()).matrix();
//std::cout << err << std::endl;
b2 = b2 - err * ones;
w2 = w2 - (err * a1.transpose());
err = ((w2.transpose() * err).array() * sigmoid_derivative(z1).array()).matrix();
//std::cout << err << std::endl;
b1 = b1 - err * ones;
w1 = w1 - (err * mbX.transpose());
}
//validation
z1 = (w1 * X).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
double cost = 1/((double) training_size) * ((a2 - Y).array() * (a2 - Y).array()).matrix().sum();
c.push_back(cost);
int correct = 0;
for (int i = 0; i < training_size; i++) {
double maxP = -1;
int na;
for (int j = 0; j < 10; j++) {
if (a2(j, i) > maxP) {
maxP = a2(j, i);
na = j;
}
}
if (na == label[i]) correct++;
}
val_ac.push_back(((double) correct) / ((double) training_size));
std::cout << "Finished Epoch " << i + 1 << std::endl;
std::cout << "Cost: " << cost << std::endl;
std::cout << "Accuracy: " << ((double) correct) / ((double) training_size) << std::endl;
}
//plot accuracy
FILE * gp = _popen("gnuplot", "w");
fprintf(gp, "set terminal wxt size 600,400 \n");
fprintf(gp, "set grid \n");
fprintf(gp, "set title '%s' \n", "NN");
fprintf(gp, "plot '-' w line, '-' w lines \n");
for (int i = 0; i < epoch; i++) {
fprintf(gp, "%f %f \n", i + 1.0, c[i]);
}
fprintf(gp, "e\n");
//validation accuracy
for (int i = 0; i < epoch; i++) {
fprintf(gp, "%f %f \n", i + 1.0, val_ac[i]);
}
fprintf(gp, "e\n");
fflush(gp);
system("pause");
_pclose(gp);
return 0;
}
UPD
Here is a graph of the accuracy on the training dataset (green) and the loss (purple)
https://i.stack.imgur.com/Ya2yR.png
Here is a graph of the loss for the training data and validation data:
https://imgur.com/a/4gmFCrk
The loss of the validation data is increasing past a certain point, which shows signs of overfitting. However, the accuracy still remains abysmal even on the training data.

unsigned int val_image[val_size][28][28];
Xv(val, i) = val_image[i][x][y]/255;
Can you try again with Xv(val, i) = val_image[i][x][y] / 255.0;
There too:
X(val, i) = image[i][x][y]/255;
With the code as written, Xv is 0 very often, and 1, when the image as value 255. With a floating point division, you'll get value between 0.0 and 1.0.
You'll need to check your code for other places where you may be dividing integers.
N.b.: In C++, 240/255 is 0.

What to do with negative rho values in hough transform?

Here is my code for creating the hough accumulator for lines in image :
void hough_lines_acc(cv::Mat img_a_edges, std::vector<std::vector<int> > &hough_acc) {
for (size_t r = 0; r < img_a_edges.rows; r++) {
for (size_t c = 0; c < img_a_edges.cols; c++) {
int theta = static_cast<int> (std::atan2(r, c) * 180 / M_PI);
int rho = static_cast<int> ((c * cos(theta)) + (r * sin(theta)));
if (theta < -90) theta = -90;
if (theta > 89) theta = 89;
++hough_acc[abs(rho)][theta];
}
}
cv::Mat img_mat(hough_acc.size(), hough_acc[0].size(), CV_8U);
std::cout << hough_acc.size() << " " << hough_acc[0].size() << std::endl;
for (size_t i = 0; i < hough_acc.size(); i++) {
for (size_t j = 0; j < hough_acc[0].size(); j++) {
img_mat.at<int> (i,j) = hough_acc[i][j];
}
}
imwrite("../output/ps1-2-b-1.png", img_mat);
}
theta varies from -90 to 89. I am getting negative rho values. Right now I am just replacing the negative who with a positive one but am not getting a correct answer. What do I do to the negative rho? Please explain the answer.
theta = arctan (y / x)
rho = x * cos(theta) + y * sin(theta)
Edited code :
bool hough_lines_acc(cv::Mat img_a_edges, std::vector<std::vector<int> > &hough_acc,\
std::vector<double> thetas, std::vector<double> rhos, int rho_resolution, int theta_resolution) {
int img_w = img_a_edges.cols;
int img_h = img_a_edges.rows;
int max_votes = 0;
int min_votes = INT_MAX;
for (size_t r = 0; r < img_h; r++) {
for (size_t c = 0; c < img_w; c++) {
if(img_a_edges.at<int>(r, c) == 255) {
for (size_t i = 0; i < thetas.size(); i++) {
thetas[i] = (thetas[i] * M_PI / 180);
double rho = ( (c * cos(thetas[i])) + (r * sin(thetas[i])) );
int buff = ++hough_acc[static_cast<int>(abs(rho))][static_cast<int>(i)];
if (buff > max_votes) {
max_votes = buff;
}
if (buff < min_votes) {
min_votes = buff;
}
}
}
}
}
double div = static_cast<double>(max_votes) / 255;
int threshold = 10;
int possible_edge = round(static_cast<double>(max_votes) / div) - threshold;
props({
{"max votes", max_votes},
{"min votes", min_votes},
{"scale", div}
});
// needed for scaling intensity for contrast
// not sure if I am doing it correctly
for (size_t r = 0; r < hough_acc.size(); r++) {
for (size_t c = 0; c < hough_acc[0].size(); c++) {
double val = hough_acc[r][c] / div;
if (val < 0) {
val = 0;
}
hough_acc[r][c] = static_cast<int>(val);
}
}
cv::Mat img_mat = cv::Mat(hough_acc.size(), hough_acc[0].size(), CV_8UC1, cv::Scalar(0));
for (size_t i = 0; i < hough_acc.size(); i++) {
for (size_t j = 0; j < hough_acc[0].size(); j++) {
img_mat.at<uint8_t> (i,j) = static_cast<uint8_t>(hough_acc[i][j]);
}
}
imwrite("../output/ps1-2-b-1.png", img_mat);
return true;
}
Still not correct output. What is the error here?

atan2 of two positive numbers... should not be giving you negative angles, it should only be giving you a range of 0-90
also for the hough transform, I think you want everything relative to one point (ie 0,0 in this case). I think for that you would actually want to make theta=90-atan2(r,c)
Admittedly though, I am a bit confused as I thought you had to encode line direction, rather than just "edge pt". ie I thought at each edge point you had to provide a discrete array of guessed line trajectories and calculate rho and theta for each one and throw all of those into your accumulator. As is... I am not sure what you are calculating.

How to implement midpoint displacement

I'm trying to implement procedural generation in my game. I want to really grasp and understand all of the algorithms nessecary rather than simply copying/pasting existing code. In order to do this I've attempted to implement 1D midpoint displacement on my own. I've used the information here to write and guide my code. Below is my completed code, it doesn't throw an error but that results don't appear correct.
srand(time(NULL));
const int lineLength = 65;
float range = 1.0;
float displacedLine[lineLength];
for (int i = 0; i < lineLength; i++)
{
displacedLine[i] = 0.0;
}
for (int p = 0; p < 100; p++)
{
int segments = 1;
for (int i = 0; i < (lineLength / pow(2, 2)); i++)
{
int segs = segments;
for (int j = 0; j < segs; j++)
{
int x = floor(lineLength / segs);
int start = (j * x) + 1;
int end = start + x;
if (i == 0)
{
end--;
}
float lo = -range;
float hi = +range;
float change = lo + static_cast <float> (rand()) / (static_cast <float> (RAND_MAX / (hi - lo)));
int center = ((end - start) / 2) + start;
displacedLine[center - 1] += change;
segments++;
}
range /= 2;
}
}
Where exactly have I made mistakes and how might I correct them?
I'm getting results like this:
But I was expecting results like this:

The answer is very simple and by the way I'm impressed you managed to debug all the potential off-by-one errors in your code. The following line is wrong:
displacedLine[center - 1] += change;
You correctly compute the center index and change amount but you missed that the change should be applied to the midpoint in terms of height. That is:
displacedLine[center - 1] = (displacedLine[start] + displacedLine[end]) / 2;
displacedLine[center - 1] += change;
I'm sure you get the idea.

The problem seems to be that you are changing only the midpoint of each line segment, rather than changing the rest of the line segment in proportion to its distance from each end to the midpoint. The following code appears to give you something more like what you're looking for:
#include <iostream>
#include <cstdlib>
#include <math.h>
#include <algorithm>
using namespace std;
void displaceMidPt (float dline[], int len, float disp) {
int midPt = len/2;
float fmidPt = float(midPt);
for (int i = 1; i <= midPt; i++) {
float ptDisp = disp * float(i)/fmidPt;
dline[i] += ptDisp;
dline[len-i] += ptDisp;
}
}
void displace (float displacedLine[], int lineLength, float range) {
for (int p = 0; p < 100; p++) {
int segs = pow(p, 2);
for (int j = 0; j < segs; j++) {
float lo = -range;
float hi = +range;
float change = lo + static_cast <float> (rand()) / (static_cast <float> (RAND_MAX / (hi - lo)));
int start = int(float(j)/float(segs)*float(lineLength));
int end = int(float(j+1)/float(segs)*float(lineLength));
displaceMidPt (displacedLine+start,end-start,change);
}
range /= 2;
}
}
void plot1D (float x[], int len, int ht = 10) {
float minX = *min_element(x,x+len);
float maxX = *max_element(x,x+len);
int xi[len];
for (int i = 0; i < len; i++) {
xi[i] = int(ht*(x[i] - minX)/(maxX - minX) + 0.5);
}
char s[len+1];
s[len] = '\0';
for (int j = ht; j >= 0; j--) {
for (int i = 0; i < len; i++) {
if (xi[i] == j) {
s[i] = '*';
} else {
s[i] = ' ';
}
}
cout << s << endl;
}
}
int main () {
srand(time(NULL));
const int lineLength = 65;
float range = 1.0;
float displacedLine[lineLength];
for (int i = 0; i < lineLength; i++) {
displacedLine[i] = 0.0;
}
displace (displacedLine,lineLength,range);
plot1D (displacedLine,lineLength);
return 0;
}
When run this way, it produces the following result:
$ c++ -lm displace.cpp
$ ./a
*
* *
* ***
* * * *
* ** **** * **
* *** **** * * * ** *
* * ** ** *** * * * *
** ** *
* * * ***
** ***
*

Laguerre interpolation algorithm, something's wrong with my implementation

This is a problem I have been struggling for a week, coming back just to give up after wasted hours...
I am supposed to find coefficents for the following Laguerre polynomial:
P0(x) = 1
P1(x) = 1 - x
Pn(x) = ((2n - 1 - x) / n) * P(n-1) - ((n - 1) / n) * P(n-2)
I believe there is an error in my implementation, because for some reason the coefficents I get seem way too big. This is the output this program generates:
a1 = -190.234
a2 = -295.833
a3 = 378.283
a4 = -939.537
a5 = 774.861
a6 = -400.612
Description of code (given below):
If you scroll the code down a little to the part where I declare array, you'll find given x's and y's.
The function polynomial just fills an array with values of said polynomial for certain x. It's a recursive function. I believe it works well, because I have checked the output values.
The gauss function finds coefficents by performing Gaussian elimination on output array. I think this is where the problems begin. I am wondering, if there's a mistake in this code or perhaps my method of veryfying results is bad? I am trying to verify them like that:
-190.234 * 1.5 ^ 5 - 295.833 * 1.5 ^ 4 ... - 400.612 = -3017,817625 =/= 2
Code:
#include "stdafx.h"
#include <conio.h>
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
double polynomial(int i, int j, double **tab)
{
double n = i;
double **array = tab;
double x = array[j][0];
if (i == 0) {
return 1;
} else if (i == 1) {
return 1 - x;
} else {
double minusone = polynomial(i - 1, j, array);
double minustwo = polynomial(i - 2, j, array);
double result = (((2.0 * n) - 1 - x) / n) * minusone - ((n - 1.0) / n) * minustwo;
return result;
}
}
int gauss(int n, double tab[6][7], double results[7])
{
double multiplier, divider;
for (int m = 0; m <= n; m++)
{
for (int i = m + 1; i <= n; i++)
{
multiplier = tab[i][m];
divider = tab[m][m];
if (divider == 0) {
return 1;
}
for (int j = m; j <= n; j++)
{
if (i == n) {
break;
}
tab[i][j] = (tab[m][j] * multiplier / divider) - tab[i][j];
}
for (int j = m; j <= n; j++) {
tab[i - 1][j] = tab[i - 1][j] / divider;
}
}
}
double s = 0;
results[n - 1] = tab[n - 1][n];
int y = 0;
for (int i = n-2; i >= 0; i--)
{
s = 0;
y++;
for (int x = 0; x < n; x++)
{
s = s + (tab[i][n - 1 - x] * results[n-(x + 1)]);
if (y == x + 1) {
break;
}
}
results[i] = tab[i][n] - s;
}
}
int _tmain(int argc, _TCHAR* argv[])
{
int num;
double **array;
array = new double*[5];
for (int i = 0; i <= 5; i++)
{
array[i] = new double[2];
}
//i 0 1 2 3 4 5
array[0][0] = 1.5; //xi 1.5 2 2.5 3.5 3.8 4.1
array[0][1] = 2; //yi 2 5 -1 0.5 3 7
array[1][0] = 2;
array[1][1] = 5;
array[2][0] = 2.5;
array[2][1] = -1;
array[3][0] = 3.5;
array[3][1] = 0.5;
array[4][0] = 3.8;
array[4][1] = 3;
array[5][0] = 4.1;
array[5][1] = 7;
double W[6][7]; //n + 1
for (int i = 0; i <= 5; i++)
{
for (int j = 0; j <= 5; j++)
{
W[i][j] = polynomial(j, i, array);
}
W[i][6] = array[i][1];
}
for (int i = 0; i <= 5; i++)
{
for (int j = 0; j <= 6; j++)
{
cout << W[i][j] << "\t";
}
cout << endl;
}
double results[6];
gauss(6, W, results);
for (int i = 0; i < 6; i++) {
cout << "a" << i + 1 << " = " << results[i] << endl;
}
_getch();
return 0;
}

I believe your interpretation of the recursive polynomial generation either needs revising or is a bit too clever for me.
given P[0][5] = {1,0,0,0,0,...}; P[1][5]={1,-1,0,0,0,...};
then P[2] is a*P[0] + convolution(P[1], { c, d });
where a = -((n - 1) / n)
c = (2n - 1)/n and d= - 1/n
This can be generalized: P[n] == a*P[n-2] + conv(P[n-1], { c,d });
In every step there is involved a polynomial multiplication with (c + d*x), which increases the degree by one (just by one...) and adding to P[n-1] multiplied with a scalar a.
Then most likely the interpolation factor x is in range [0..1].
(convolution means, that you should implement polynomial multiplication, which luckily is easy...)
[a,b,c,d]
* [e,f]
------------------
af,bf,cf,df +
ae,be,ce,de, 0 +
--------------------------
(= coefficients of the final polynomial)

The definition of P1(x) = x - 1 is not implemented as stated. You have 1 - x in the computation.
I did not look any further.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Gradient descent converging towards the wrong value - c++

Related

Fixing Neural Net vanishing gradients problem?

C++ neural network implemented from scratch cannot get above 50% on MNIST

What to do with negative rho values in hough transform?

How to implement midpoint displacement

Laguerre interpolation algorithm, something's wrong with my implementation

Categories

Resources