Trying to implement Durand-Kerner-Method in C++ using Matrices

Trying to implement Durand-Kerner-Method in C++ using Matrices - c++

My implementation of the Durand-Kerner-Method (https://en.wikipedia.org/wiki/Durand%E2%80%93Kerner_method) does not seem to work. I believe (see following code) that I am not calculating new approximation correctly in the algorithm part itself. I cannot seem to be able to fix the problem. Very grateful for any advice.
#include <complex>
#include <cmath>
#include <vector>
#include <iostream>
#include "DurandKernerWeierstrass.h"
using namespace std;
using Complex = complex<double>;
using vec = vector<Complex>;
using Matrix = vector<vector<Complex>>;
//PRE: Recieves input value of polynomial, degree and coefficients
//POST: Outputs y(x) value
Complex Polynomial(vec Z, int n, Complex x) {
Complex y = pow(x, n);
for (int i = 0; i < n; i++){
y += Z[i] * pow(x, (n - i - 1));
}
return y;
}
/*PRE: Takes a test value, degree of polynomial, vector of coefficients and the desired
precision of polynomial roots to calculate the roots*/
//POST: Outputs the roots of Polynomial
Matrix roots(vec Z, int n, int iterations, const double precision) {
Complex z = Complex(0.4, 0.9);
Matrix P(iterations, vec(n, 0));
Complex w;
//Creating Matrix with initial starting values
for (int i = 0; i < n; i++) {
P[0][i] = pow(z, i);
}
//Durand Kerner Algorithm
for (int col = 0; col < iterations; col++) {
*//I believe this is the point where everything is going wrong*
for (int row = 0; row < n; row++) {
Complex g = Polynomial(Z, n, P[col][row]);
for (int k = 0; k < n; k++) {
if (k != row) {
g = g / (P[col][row] - P[col][k]);
}
}
P[col][row] -= g;
}
return P;
}
}
The following Code is the code I am using to test the function:
int main() {
//Initializing section
vec A = {1, -3, 3,-5 };
int n = 3;
int iterations = 10;
const double precision = 1.0e-10;
Matrix p = roots(A, n, iterations,precision);
for (int i = 0; i < iterations; i++) {
for (int j = 0; j < n; j++) {
cout << "p[" << i << "][" << j << "] = " << p[i][j] << " ";
}
cout << endl;
}
return 0;
}
Important to note the Durand-Kerner-Algorithm is connected to a header file which is not included in this code.

Your problem is that you do not transcribe the new values into the next data record with index col+1. Thus in the next loop you start again with a data set of zero entries. Change to
P[col+1][row] = P[col][row] - g;
If you want to use the new improved approximation immediately for all following approximations, then use
P[col+1][row] = (P[col][row] -= g);
Then the data sets all contain the next approximations, especially the first one will no longer contain the initially set powers.

Related

C++ Matrix and Vector Multiplication Seg faults when trying to store them into a summation

I am trying to compute a Jacobi iteration by following the pseudocode I found on wikipedia. I have run my code through gdb and I find that I have a heap-buffer-overflow whenever I try to compute the sum of my Matrix and vector being multiplied together.
Here is my code:
std::vector<double> sol(std::vector<double> &x,std::vector<std::vector<double> > &A, std::vector<double> &b, int n)
{
double sum;
int counter = n;
while(counter != 0)
{
for (int i = 1; i <= n; ++i)
{
sum = 0.0;
for (int j = 1; j <= n; ++j)
{
if(j != i)
{
sum += A[i][j]*x[j]; //Issue seems to be here in GDB
std::cout << "Sum " << sum << std::endl;
}
}
x[i] = (1.0/A[i][i])*(b[i]-sum);
for(auto&& e : x)
{
std::cout << e << " ";
}
std::cout << std::endl;
}
counter--;
}
return x;
}
int main()
{
//const int SIZE = 1000;
const int SIZE = 2;
double ranNumber = 0.0;
std::vector<std::vector<double> > A;
std::vector<double> testX = {1.0,1.0};
std::vector<double> testB = {11.0,13.0};
for (int i = 0; i < SIZE; ++i)
{
std::vector<double> k;
for(int j = 0; j < SIZE; ++j)
{
ranNumber = randNumber();
k.emplace_back(ranNumber);
}
A.emplace_back(k);
}
A[0][0] = 2.0;
A[0][1] = 1.0;
A[1][0] = 5.0;
A[1][1] = 7.0;
std::vector<double> xSol = sol(testX,A,testB,30);
for(auto &&e:xSol)
{
std::cout << e << " ";
}
std::cout << std::endl;
return 0;
}
According to the wiki, I should receive the answer 7.1111, -3.2222, I think I have followed the pseudocode except for the k part because I am not quite sure how to implement that into a vector.
What is causing the segmentation fault? Am I going out of bounds in my vector or Matrix? That is what leads me to think I am seg faulting but I am not sure exactly what is going on here. Any help will be much appreciated.
Thanks
EDIT: I should clarify, yes, this is a terrible way to have a vector of vectors implemented. This is just a test to see if I can replicate what they have on Wikipedia. If I can get this answer, I will remove the unnecessary A[0][0]...etc. I have a random number function that will generate the numbers for me. But this is just to make sure this is working correctly.

First, you have the indexing issue from 0 to n-1 instead of 1 to n.
Then you construct in main your vector A being 2 x 2, but you iterate throug A[i][j], j going until 30. So you access the array out of bounds ! Call the function using SIZE because you construct the matrix based on SIZE.
Finally, you divide by A[i][i] without first ensuring that it's not a divide by zero. (Ok it's not, but you should verify as a kind of reflex).
I don't know if you'll get the correct answer, but you should no longer experience short dumps.

How to write multidimensional vector data to a file in C++

I have a vector of objects obj (of class Holder) with N elements with members like x and y which are also vectors of double type with M elements. I would like to write a text file creating an MxN matrix from this. I have tried lots of different things to no avail up to now.
vector<Holder> obj(N);
void savedata(string filename, vector<Holder> obj, int M, int N) {
ofstream out(filename);
for(int i = 0; i < M; i++) {
for(int j = 0; j < N; j++) {
out << obj[i][j] << "\t" << endl;
}
}
}
But this just takes the last set of values. How can I create such an MxN matrix where rows are from the object member vector x and columns are from the object vector itself?
Thank you in advance.
--
The bigger version of the code is as follows:
//
//
#include <iostream>
#include <cmath>
#include <fstream>
#include <string>
#include <vector>
#include <random>
using namespace std;
typedef vector< vector<double> > Matrix;
// Particles making up the cell
class Particle{
public:
double x; // x position
double y; // y position
double vx; // velocity in the x direction
double vy; // velocity in the y direction
double Fx; // force in the x direction
double Fy; // force in the y direction
// Default constructor
Particle()
: x(0.0),y(0.0),vx(0.0),vy(0.0),Fx(0.0),Fy(0.0){
}
};
// Holder for storing data
class HoldPar{
public:
vector<double> x;
vector<double> y;
vector<double> vx;
vector<double> vy;
// Default constructor
HoldPar()
: x(0.0),y(0.0),vx(0.0),vy(0.0){
}
// Add elements to vectors
void add_Xelement(double a) {
x.push_back(a);
}
void add_Yelement(double a) {
y.push_back(a);
}
void add_VXelement(double a) {
vx.push_back(a);
}
void add_VYelement(double a) {
vy.push_back(a);
}
};
int main() {
// Initialization of x, v and F
const float pi = 3.14;
int N = 30; // Number of 'particles' that make up the cell
float theta = 2*pi/N; // Angle between two particles in radians
float x0 = 0; // Center of the cell [x]
float y0 = 0; // Center of the cell [y]
float R = 5e-6; // Radius of the cell
vector<Particle> particles(N); // particles
// Assigning the initial points onto the circle
for(int i = 0; i < N; i++) {
particles[i].x = x0 + R*cos(theta*i);
particles[i].y = y0 + R*sin(theta*i);
}
float k = 4.3e-7; // Spring constant connecting the particles
float m = 2e-8; // Mass of the particles
// Calculating the initial spring force between the particles on the cell
particles[0].Fx = -k*(particles[1].x - particles[N].x);
particles[0].Fy = -k*(particles[1].y - particles[N].y);
for(int i = 1; i < N-1; i++) {
particles[i].Fx = -k*(particles[i+1].x - particles[i-1].x);
particles[i].Fy = -k*(particles[i+1].y - particles[i-1].y);
}
particles[N].Fx = -k*(particles[0].x - particles[N-1].x);
particles[N].Fy = -k*(particles[0].y - particles[N-1].y);
// Initial velocities are given to each particle randomly from a Gaussian distribution
random_device rdx; // Seed
default_random_engine generatorx(rdx()); // Default random number generator
random_device rdy; // Seed
default_random_engine generatory(rdy()); // Default random number generator
normal_distribution<float> distributionx(0,1); // Gaussian distribution with 0 mean and 1 variance
normal_distribution<float> distributiony(0,1); // Gaussian distribution with 0 mean and 1 variance
for(int i = 0; i < N; i++) {
float xnumber = distributionx(generatorx);
float ynumber = distributiony(generatory);
particles[i].vx = xnumber;
particles[i].vy = ynumber;
}
// Molecular dynamics simulation with velocity Verlet algorithm
// 'Old' variables
vector<Particle> particles_old(N);
for(int i = 0; i < N; i++) {
particles_old[i].x = particles[i].x;
particles_old[i].y = particles[i].y;
particles_old[i].vx = particles[i].vx;
particles_old[i].vy = particles[i].vy;
particles_old[i].Fx = particles[i].Fx;
particles_old[i].Fy = particles[i].Fy;
}
// Sampling variables
int sampleFreq = 2;
int sampleCounter = 0;
// MD variables
float dt = 1e-4;
float dt2 = dt*dt;
float m2 = 2*m;
int MdS = 1e+5; // Molecular dynamics step number
// Holder variables
vector<HoldPar> particles_hold(N);
// MD
for(int j = 0; j < MdS; j++) {
// Update x
for(int i = 0; i < N; i++) {
particles[i].x = particles_old[i].x + dt*particles_old[i].vx + dt2*particles_old[i].Fx/m2;
particles[i].y = particles_old[i].y + dt*particles_old[i].vy + dt2*particles_old[i].Fy/m2;
}
// Update F
particles[0].Fx = -k*(particles[1].x - particles[N].x);
particles[0].Fy = -k*(particles[1].y - particles[N].y);
for(int i = 1; i < N-1; i++) {
particles[i].Fx = -k*(particles[i+1].x - particles[i-1].x);
particles[i].Fy = -k*(particles[i+1].y - particles[i-1].y);
}
particles[N].Fx = -k*(particles[0].x - particles[N-1].x);
particles[N].Fy = -k*(particles[0].y - particles[N-1].y);
// Update v
for(int i = 0; i < N; i++) {
particles[i].vx = particles_old[i].vx + dt*(particles_old[i].Fx + particles[i].Fx)/m2;
particles[i].vy = particles_old[i].vy + dt*(particles_old[i].Fy + particles[i].Fy)/m2;
}
// Copy new variables to old variables
for(int i = 0; i < N; i++) {
particles_old[i].x = particles[i].x;
particles_old[i].y = particles[i].y;
particles_old[i].vx = particles[i].vx;
particles_old[i].vy = particles[i].vy;
particles_old[i].Fx = particles[i].Fx;
particles_old[i].Fy = particles[i].Fy;
}
// Store variables
if(j % sampleFreq == 0) {
for(int i = 0; i < N; i++) {
particles_hold[i].add_Xelement( particles[i].x );
particles_hold[i].add_Yelement( particles[i].y );
particles_hold[i].add_VXelement( particles[i].vx );
particles_hold[i].add_VYelement( particles[i].vy );
}
sampleCounter += 1;
}
}
//* End of molecular dynamics simulation
}
//
//*
//
Essentially I'm trying to write a txt file where particles_hold elements (from 1 to N) are columns and members of particles_hold elements like x (from 1 to some value M) are rows.

If you mean visually then the way is put endl or "\n" to the outer loop and remove endl from inner loop.But i do not know anythig about your Holder object and if you have [] operator defined there that is the answer.
vector<Holder> obj(N);
void savedata(string filename, vector<Holder> obj, int M, int N) {
ofstream out(filename);
for(int i = 0; i < M; i++) {
for(int j = 0; j < N; j++) {
out << obj[i][j] << "\t";
}
out<< "\n";
}
}

Your method is ok, however, made some minor change so that you have M lines, each lines represent obj[i], i = 0.. M-1. So, each column (jth index) is printed as tab separated in each line
vector<Holder> obj(N);
void savedata(string filename, vector<Holder> obj, int M, int N) {
ofstream out(filename);
for(int i = 0; i < M; i++) {
for(int j = 0; j < N; j++) {
out << obj[i][j] << "\t";
}
out << endl;
}
}

Sorting an array diagonally

I've looked up some websites but I couldn't find an answer to my problem.
Here's my code:
#include "stdafx.h"
#include <iostream>
#include <math.h>
#include <time.h>
#include<iomanip>
#include<array>
#include <algorithm>
using namespace std;
const int AS = 6;
int filling(void);
void printing(int[AS][AS]);
int forsorting(int[][AS], int);
int main()
{
int funny = 0;
int timpa = 0;
int counter = 0;
int Array[AS][AS];
srand(time(0));
for (int i = 0; i<AS; i++)
{
for (int j = 0; j<AS; j++)
Array[i][j] = filling();
}
cout << "The unsorted array is" << endl << endl;
printing(Array);
cout << "The sorted array is" << endl << endl;
for (int il = 0; il<AS; il++)
{
for (int elle = 0; elle<AS; elle++)
Array[il][elle] =forsorting(Array, funny);
printing(Array);
}
system("PAUSE");
return 0;
}
int filling(void)
{
int kira;
kira = rand() % 87 + 12;
return kira;
}
void printing(int Array[AS][AS])
{
int counter = 0;
for (int i = 0; i<AS; i++)
{
for (int j = 0; j<AS; j++)
{
cout << setw(5) << Array[i][j];
counter++;
if (counter%AS == 0)
cout << endl << endl;
}
}
}
int forsorting(int Array[AS][AS], int funny)
{
int c, tmp, x;
int dice = 0;
int Brray[AS*AS];
int timpa = 0;
int super = 0;
//Transofrming Array[][] into Brray[]
for (int i = 0; i < AS; i++)
{
for (int k = 0; k < AS; k++)
{
Brray[timpa] = Array[i][k];
timpa++;
}
}
//Bubble sorting in Brray[]
for (int passer = 1; passer <= AS-1; passer++)
{
for (int timon = 1; timon <= AS-1; timon++)
{
if (Brray[timpa]>Brray[timpa + 1])
{
super = Brray[timpa];
Brray[timpa] = Brray[timpa + 1];
Brray[timpa + 1] = super;
}
}
}
//Transforming Brray[] into Array[][]
for (int e = 0; e<AS; e++)
{
for (int d = 0; d<AS; d++)
{
Brray[dice] = Array[e][d];
dice++;
}
}
***There's a part missing here***
}
What I have to do is, write a program using 3 functions.
The 1st function would fill my 2D array randomly (no problem with this part)
the 2nd function would print the unsorted array on the screen (no problem with this part)
and the 3rd function would sort my array diagonally as shown in this picture:
Then I need to call the 2nd function to print the sorted array. My problem is with the 3rd function I turned my 2D array into a 1D array and sorted it using Bubble sorting, but what I can't do is turn it back into a 2D array diagonaly sorted.

If you can convert from a 2D array to a 1D array, then converting back is the reverse process. Take the same loop and change around the assignment.
However in your case the conversion itself is wrong. It should take indexes in the order (0;0), (0;1), (1;0). But what it does is take indexes in the order (0;0), (0;1), (1;1).
My suggestion is to use the fact that the sum of the X and Y coordinates on each diagonal is the same and it goes from 0 to AS*2-2.
Then with another loop you can check for all possible valid x/y combinations. Something like this:
for ( int sum = 0; sum < AS*2-1; sum++ )
{
for ( int y = sum >= AS ? sum-AS+1 : 0; y < AS; y++ )
{
x = sum - y;
// Here assign either from Array to Brray or from Brray to Array
}
}
P.S. If you want to be really clever, I'm pretty sure that you can make a mathematical (non-iterative) function that converts from the index in Brray to an index-pair in Array, and vice-versa. Then you can apply the bubble-sort in place. But that's a bit more tricky than I'm willing to figure out right now. You might get extra credit for that though.
P.P.S. Realization next morning: you can use this approach to implement the bubble sort directly in the 2D array. No need for copying. Think of it this way: If you know a pair of (x;y) coordinates, you can easily figure out the next (x;y) coordinate on the list. So you can move forwards through the array from any point. That is all the the bubble sort needs anyway.

Suppose you have a 0-based 1-dimensional array A of n = m^2 elements. I'm going to tell you how to get an index into A, given and a pair of indices into a 2D array, according to your diagonalization method. I'll call i the (0-based) index in A, and x and y the (0-based) indices in the 2D array.
First, let's suppose we know x and y. All of the entries in the diagonal containing (x,y) have the same sum of their coordinates. Let sum = x + y. Before you got to the diagonal containing this entry, you iterated through sum earlier diagonals (check that this is right, due to zero-based indexing). The diagonal having sum k has a total of k + 1 entries. So, before getting to this diagonal, you iterated through 1 + 2 + ... + (sum - 1) entries. There is a formula for a sum of the form 1 + 2 + ... + N, namely N * (N + 1) / 2. So, before getting to this diagonal, you iterated through (sum - 1) * sum / 2 entries.
Now, before getting to the entry at (x,y), you went through a few entries in this very diagonal, didn't you? How many? Why, it's exactly y! You start at the top entry and go down one at a time. So, the entry at (x,y) is the ((sum - 1) * sum / 2 + y + 1)th entry, but the array is zero-based too, so we need to subtract one. So, we get the formula:
i = (sum - 1) * sum / 2 + y = (x + y - 1) * (x + y) / 2 + y
To go backward, we want to start with i, and figure out the (x,y) pair in the 2D array where the element A[i] goes. Because we are solving for two variables (x and y) starting with one (just i) and a constraint, it is trickier to write down a closed formula. In fact I'm not convinced that a closed form is possible, and certainly not without some floors, etc. I began trying to find one and gave up! Good luck!
It's probably correct and easier to just generate the (x,y) pairs iteratively as you increment i, keeping in mind that the sums of coordinate pairs are constant within one of your diagonals.

Store the "diagonally sorted" numbers into an array and use this to display your sorted array. For ease, assume 0-based indexing:
char order[] = { 0, 1, 3, 6, 10, 2, 4, 7, 11, 15, .. (etc)
Then loop over this array and display as
printf ("%d", Array[order[x]]);
Note that it is easier if your sorted Array is still one-dimensional at this step. You'd add the second dimension only when printing.

Following may help you:
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <vector>
template<typename T>
class DiagArray
{
public:
DiagArray(int size) : width(size), data(size * size), orders(size * size)
{
buildTableOrder(size);
}
const T& operator() (int x, int y) const { return data[orders[width * y + x]]; }
T& operator() (int x, int y) { return data[orders[width * y + x]]; }
void sort() { std::sort(data.begin(), data.end()); }
void display() const {
int counter = 0;
for (auto index : orders) {
std::cout << std::setw(5) << data[index];
counter++;
if (counter % width == 0) {
std::cout << std::endl;
}
}
}
private:
void buildTableOrder(int size)
{
int diag = 0;
int x = 0;
int y = 0;
for (int i = 0; i != size * size; ++i) {
orders[y * size + x] = i;
++y;
--x;
if (x < 0 || y >= size) {
++diag;
x = std::min(diag, size - 1);
y = diag - x;
}
}
}
private:
int width;
std::vector<T> data;
std::vector<int> orders;
};
int main(int argc, char *argv[])
{
const int size = 5;
DiagArray<int> da(size);
for (int y = 0; y != size; ++y) {
for (int x = 0; x != size; ++x) {
da(x, y) = size * y + x;
}
}
da.display();
std::cout << std::endl;
da.sort();
da.display();
return 0;
}

Thank you for your assistance everyone, what you said was very useful to me. I actually was able to think about clearly and came up with a way to start filling the array based on your recommendation, but one problem now, Im pretty sure that my logic is 99% right but there's a flaw somewhere. After I run my code the 2nd array isnt printed on the screen. Any help with this?
#include "stdafx.h"
#include <iostream>
#include <math.h>
#include <time.h>
#include<iomanip>
#include<array>
#include <algorithm>
using namespace std;
const int AS = 5;
int filling(void);
void printing(int[AS][AS]);
int forsorting(int[][AS], int);
int main()
{
int funny = 0;
int timpa = 0;
int counter = 0;
int Array[AS][AS];
srand(time(0));
for (int i = 0; i<AS; i++)
{
for (int j = 0; j<AS; j++)
Array[i][j] = filling();
}
cout << "The unsorted array is" << endl << endl;
printing(Array);
cout << "The sorted array is" << endl << endl;
for (int il = 0; il<AS; il++)
{
for (int elle = 0; elle<AS; elle++)
Array[il][elle] =forsorting(Array, funny);
}
printing(Array);
system("PAUSE");
return 0;
}
int filling(void)
{
int kira;
kira = rand() % 87 + 12;
return kira;
}
void printing(int Array[AS][AS])
{
int counter = 0;
for (int i = 0; i<AS; i++)
{
for (int j = 0; j<AS; j++)
{
cout << setw(5) << Array[i][j];
counter++;
if (counter%AS == 0)
cout << endl << endl;
}
}
}
int forsorting(int Array[AS][AS], int funny)
{int n;
int real;
int dice = 0;
int Brray[AS*AS];
int timpa = 0;
int super = 0;
int median;
int row=0;
int col=AS-1;
//Transofrming Array[][] into Brray[]
for (int i = 0; i < AS; i++)
{
for (int k = 0; k < AS; k++)
{
Brray[timpa] = Array[i][k];
timpa++;
}
}
//Bubble sorting in Brray[]
for (int passer = 1; passer <= AS-1; passer++)
{
for (int timon = 1; timon <= AS-1; timon++)
{
if (Brray[timpa]>Brray[timpa + 1])
{
super = Brray[timpa];
Brray[timpa] = Brray[timpa + 1];
Brray[timpa + 1] = super;
}
}
}
//Transforming Brray[] into sorted Array[][]
for(int e=4;e>=0;e--)//e is the index of the diagonal we're working in
{
if(AS%2==0)
{median=0.5*(Brray[AS*AS/2]+Brray[AS*AS/2-1]);
//We start filling at median - Brray[AS*AS/2-1]
while(row<5 && col>=0)
{real=median-Brray[AS*AS/2-1];
Array[row][col]=Brray[real];
real++;
col--;
row++;}
}
else {
median=Brray[AS*AS/2];
//We start filling at Brray[AS*AS/2-AS/2]
while(row<5 && col>=0)
{real=Brray[AS*AS/2-AS/2];
n=Array[row][col]=Brray[real];
real++;
col--;
row++;}
}
}
return n;
}
Thanks again for your assistance

Gauss Elimination for NxM matrix

/* Program to demonstrate gaussian <strong class="highlight">elimination</strong>
on a set of linear simultaneous equations
*/
#include <iostream>
#include <cmath>
#include <vector>
using namespace std;
const double eps = 1.e-15;
/*Preliminary pivoting strategy
Pivoting function
*/
double pivot(vector<vector<double> > &a, vector<double> &b, int i)
{
int n = a.size();
int j=i;
double t=0;
for(int k=i; k<n; k+=1)
{
double aki = fabs(a[k][i]);
if(aki>t)
{
t=aki;
j=k;
}
}
if(j>i)
{
double dummy;
for(int L=0; L<n; L+=1)
{
dummy = a[i][L];
a[i][L]= a[j][L];
a[j][L]= dummy;
}
double temp = b[j];
b[i]=b[j];
b[j]=temp;
}
return a[i][i];
}
/* Forward <strong class="highlight">elimination</strong> */
void triang(vector<vector<double> > &a, vector<double> &b)
{
int n = a.size();
for(int i=0; i<n-1; i+=1)
{
double diag = pivot(a,b,i);
if(fabs(diag)<eps)
{
cout<<"zero det"<<endl;
return;
}
for(int j=i+1; j<n; j+=1)
{
double mult = a[j][i]/diag;
for(int k = i+1; k<n; k+=1)
{
a[j][k]-=mult*a[i][k];
}
b[j]-=mult*b[i];
}
}
}
/*
DOT PRODUCT OF TWO VECTORS
*/
double dotProd(vector<double> &u, vector<double> &v, int k1,int k2)
{
double sum = 0;
for(int i = k1; i <= k2; i += 1)
{
sum += u[i] * v[i];
}
return sum;
}
/*
BACK SUBSTITUTION STEP
*/
void backSubst(vector<vector<double> > &a, vector<double> &b, vector<double> &x)
{
int n = a.size();
for(int i = n-1; i >= 0; i -= 1)
{
x[i] = (b[i] - dotProd(a[i], x, i + 1, n-1))/ a[i][i];
}
}
/*
REFINED GAUSSIAN <strong class="highlight">ELIMINATION</strong> PROCEDURE
*/
void gauss(vector<vector<double> > &a, vector<double> &b, vector<double> &x)
{
triang(a, b);
backSubst(a, b, x);
}
// EXAMPLE MAIN PROGRAM
int main()
{
int n;
cin >> n;
vector<vector<double> > a;
vector<double> x;
vector<double> b;
for (int i = 0; i < n; i++) {
vector<double> temp;
for (int j = 0; j < n; j++) {
int no;
cin >> no;
temp.push_back(no);
}
a.push_back(temp);
b.push_back(0);
x.push_back(0);
}
/*
for (int i = 0; i < n; i++) {
int no;
cin >> no;
b.push_back(no);
x.push_back(0);
}
*/
gauss(a, b, x);
for (size_t i = 0; i < x.size(); i++) {
cout << x[i] << endl;
}
return 0;
}
The above gaussian eleimination algorithm works fine on NxN matrices. But I need it to work on NxM matrix. Can anyone help me to do it? I am not very good at maths. I got this code on some website and i am stuck at it.

(optional) Understand this. Do some examples on paper.
Don't write code for Gaussian elimination yourself. Without some care, the naive gauss pivoting is unstable. You have to scale the lines and take care of pivoting with the greatest element, a starting point is there. Note that this advice holds for most linear algebra algorithms.
If you want to solve systems of equations, LU decomposition, QR decomposition (stabler than LU, but slower), Cholesky decomposition (in the case the system is symmetric) or SVD (in the case the system is not square) are almost always better choices. Gaussian elimination is best for computing determinants however.
Use the algorithms from LAPACK for the problems which need Gaussian elimination (eg. solving systems, or computing determinants). Really. Don't roll your own. Since you are doing C++, you may be interested in Armadillo which takes care of a lot of things for you.
If you must roll your own for pedagogical reasons, have a look first at Numerical Recipes, version 3. Version 2 can be found online for free if you're low on budget / have no access to a library.
As a general advice, don't code algorithms you don't understand.

You just cannot apply Gaussian elimination directly to an NxM problem. If you have more equations than unknowns, the your problem is over-determined and you have no solution, which means you need to use something like the least squares method. Say that you have A*x = b, then instead of having x = inv(A)*b (when N=M), then you have to do x = inv(A^T*A)*A^T*b.
In the case where you have less equations then unknowns, then your problem is underdetermined and you have an infinity of solutions. In that case, you either pick one at random (e.g. setting some of the unknowns to an arbitrary value), or you need to use regularization, which means trying adding some extra constraints.

You can apply echelon reduction, like in this snippet
#include <iostream>
#include <algorithm>
#include <vector>
#include <iomanip>
using namespace std;
/*
A rectangular matrix is in echelon form(or row echelon form) if it has the following
three properties :
1. All nonzero rows are above any rows of all zeros.
2. Each leading entry of a row is in a column to the right of the leading entry of
the row above it.
3. All entries in a column below a leading entry are zeros.
If a matrix in echelon form satisfies the following additional conditions,
then it is in reduced echelon form(or reduced row echelon form) :
4. The leading entry in each nonzero row is 1.
5. Each leading 1 is the only nonzero entry in its column.
*/
template <typename C> void print(const C& c) {
for (const auto& e : c) {
cout << setw(10) << right << e;
}
cout << endl;
}
template <typename C> void print2(const C& c) {
for (const auto& e : c) {
print(e);
}
cout << endl;
}
// input matrix consists of rows, which are vectors of double
vector<vector<double>> Gauss::Reduce(const vector<vector<double>>& matrix)
{
if (matrix.size() == 0)
throw string("Empty matrix");
auto A{ matrix };
auto mima = minmax_element(A.begin(), A.end(), [](const vector<double>& a, const vector<double>& b) {return a.size() < b.size(); });
auto mi = mima.first - A.begin(), ma = mima.second - A.begin();
if (A[mi].size() != A[ma].size())
throw string("All rows shall have equal length");
size_t height = A.size();
size_t width = A[0].size();
if (width == 0)
throw string("Only empty rows");
for (size_t row = 0; row != height; row++) {
cout << "processing row " << row << endl;
// Search for maximum below current row in column row and move it to current row; skip this step on the last one
size_t col{ row }, maxRow{ 0 };
// find pivot for current row (partial pivoting)
while (col < width)
{
maxRow = distance(A.begin(), max_element(A.begin() + row, A.end(), [col](const vector<double>& rowVectorA, const vector<double>& rowVectorB) {return abs(rowVectorA[col]) < abs(rowVectorB[col]); }));
if (A[maxRow][col] != 0) // nonzero in this row and column or below found
break;
++col;
}
if (col == width) // e.g. in current row and below all entries are zero
break;
if (row != maxRow)
{
swap(A[row], A[maxRow]);
cout << "swapped " << row << " and " << maxRow;
}
cout << " => leading entry in column " << col << endl;
print2(A);
// here col >= row holds; col is the column of the leading entry e.g. first nonzero column in current row
// moreover, all entries to the left and below are zeroed
if (row+1 < height)
cout << "processing column " << col << endl;
// Make in all rows below this one 0 in current column
for (size_t rowBelow = row + 1; rowBelow < height; rowBelow++) {
// subtract product of current row by factor
double factor = A[rowBelow][col] / A[row][col];
cout << "processing row " << rowBelow << " below the current; factor is " << factor << endl;
if (factor == 0)
continue;
for (size_t colRight{ col }; colRight < width; colRight++)
{
auto d = A[rowBelow][colRight] - factor * A[row][colRight];
A[rowBelow][colRight] = abs(d) < DBL_EPSILON ? 0 : d;
}
print(A[rowBelow]);
}
}
// the matrix A is in echelon form now
cout << "matrix in echelon form" << endl;
print2(A);
// reduced echelon form follows (backward phase)
size_t row(height-1);
auto findPivot = [&row, A] () -> size_t {
do
{
auto pos = find_if(A[row].begin(), A[row].end(), [](double d) {return d != 0; });
if (pos != A[row].end())
return pos - A[row].begin();
} while (row-- > 0);
return A[0].size();
};
do
{
auto col = findPivot();
if (col == width)
break;
cout << "processing row " << row << endl;
if (A[row][col] != 1)
{
//scale row row to make element at [row][col] equal one
auto f = 1 / A[row][col];
transform(A[row].begin()+col, A[row].end(), A[row].begin()+col, [f](double d) {return d * f; });
}
auto rowAbove{ row};
while (rowAbove > 0)
{
rowAbove--;
double factor = A[rowAbove][col];
if (abs(factor) > 0)
{
for (auto colAbove{ 0 }; colAbove < width; colAbove++)
{
auto d = A[rowAbove][colAbove] - factor * A[row][colAbove];
A[rowAbove][colAbove] = abs(d) < DBL_EPSILON ? 0 : d;
}
cout << "transformed row " << rowAbove << endl;
print(A[rowAbove]);
}
}
} while (row-- > 0);
return A;
}

Getting a simple Neural Network to work from scratch in C++

I have been trying to get a simple double XOR neural network to work and I am having problems getting backpropagation to train a really simple feed forward neural network.
I have been mostly been trying to follow this guide in getting a neural network but have at best made programs that learn at extremely slow rate.
As I understand neural networks:
Values are computed by taking the result of a sigmoid function from the sum of all inputs to that neuron. This is then fed to the next layer using the weight for each neuron
At the end of running the error is computed for the output neurons, then using the weights, error is back propagated back by simply multiplying the values and then summing at each Neuron
When all of the errors are computed the weights are adjusted by the delta = weight of connection * derivative of the sigmoid (value of Neuron weight is going to) * value of Neuron that connection is to * error of neuron * amount of output error of neuron going to * beta (some constant for learning rate)
This is my current muck of code that I am trying to get working. I have a lot of other attempts somewhat mixed in, but the main backpropagation function that I am trying to get working is on line 293 in Net.cpp

Have a look at 15 Steps to implement a Neural Network, it should get you started.

I wrote a simple a "Tutorial" that you can check out below.
It is a simple implementation of the perceptron model. You can imagine a perceptron as a neural network with only one neuron. There is of curse code that you can test out that I wrote in C++. I go through the code step by step so you shouldn't have any issues.
Although the perceptron isn't really a "Neural Network" it is really helpful if you want to get started and might help you better understand how a full Neural Network works.
Hope that helps!
Cheers! ^_^
In this example I will go through the implementation of the perceptron model in C++ so that you can get a better idea of how it works.
First things first it is a good practice to write down a simple algorithm of what we want to do.
Algorithm:
Make a the vector for the weights and initialize it to 0 (Don't forget to add the bias term)
Keep adjusting the weights until we get 0 errors or a low error count.
Make predictions on unseen data.
Having written a super simple algorithm let's now write some of the functions that we will need.
We will need a function to calculate the net's input (e.i *x * wT* multiplying the inputs time the weights)
A step function so that we get a prediction of either 1 or -1
And a function that finds the ideal values for the weights.
So without further ado let's get right into it.
Let's start simple by creating a perceptron class:
class perceptron
{
public:
private:
};
Now let's add the functions that we will need.
class perceptron
{
public:
perceptron(float eta,int epochs);
float netInput(vector<float> X);
int predict(vector<float> X);
void fit(vector< vector<float> > X, vector<float> y);
private:
};
Notice how the function fit takes as an argument a vector of vector< float >. That is because our training dataset is a matrix of inputs. Essentially we can imagine that matrix as a couple of vectors x stacked the one on top of another and each column of that Matrix being a feature.
Finally let's add the values that our class needs to have. Such as the vector w to hold the weights, the number of epochs which indicates the number of passes that we will do over the training dataset. And the constant eta which is the learning rate of which we will multiply each weight update in order to make the training procedure faster by dialing this value up or if eta is too high we can dial it down to get the ideal result( for most applications of the perceptron I would suggest an eta value of 0.1 ).
class perceptron
{
public:
perceptron(float eta,int epochs);
float netInput(vector<float> X);
int predict(vector<float> X);
void fit(vector< vector<float> > X, vector<float> y);
private:
float m_eta;
int m_epochs;
vector < float > m_w;
};
Now with our class set. It's time to write each one of the functions.
We will start from the constructor ( perceptron(float eta,int epochs); )
perceptron::perceptron(float eta, int epochs)
{
m_epochs = epochs; // We set the private variable m_epochs to the user selected value
m_eta = eta; // We do the same thing for eta
}
As you can see what we will be doing is very simple stuff. So let's move on to another simple function. The predict function( int predict(vector X); ). Remember that what the all predict function does is taking the net input and returning a value of 1 if the netInput is bigger than 0 and -1 otherwhise.
int perceptron::predict(vector<float> X)
{
return netInput(X) > 0 ? 1 : -1; //Step Function
}
Notice that we used an inline if statement to make our lives easier. Here's how the inline if statement works:
condition ? if_true : else
So far so good. Let's move on to implementing the netInput function( float netInput(vector X); )
The netInput does the following; multiplies the input vector by the transpose of the weights vector
*x * wT*
In other words, it multiplies each element of the input vector x by the corresponding element of the vector of weights w and then takes their sum and adds the bias.
*(x1 * w1 + x2 * w2 + ... + xn * wn) + bias*
*bias = 1 * w0*
float perceptron::netInput(vector<float> X)
{
// Sum(Vector of weights * Input vector) + bias
float probabilities = m_w[0]; // In this example I am adding the perceptron first
for (int i = 0; i < X.size(); i++)
{
probabilities += X[i] * m_w[i + 1]; // Notice that for the weights I am counting
// from the 2nd element since w0 is the bias and I already added it first.
}
return probabilities;
}
Alright so we are now pretty much done last thing we need to do is to write the fit function which modifies the weights.
void perceptron::fit(vector< vector<float> > X, vector<float> y)
{
for (int i = 0; i < X[0].size() + 1; i++) // X[0].size() + 1 -> I am using +1 to add the bias term
{
m_w.push_back(0); // Setting each weight to 0 and making the size of the vector
// The same as the number of features (X[0].size()) + 1 for the bias term
}
for (int i = 0; i < m_epochs; i++) // Iterating through each epoch
{
for (int j = 0; j < X.size(); j++) // Iterating though each vector in our training Matrix
{
float update = m_eta * (y[j] - predict(X[j])); //we calculate the change for the weights
for (int w = 1; w < m_w.size(); w++){ m_w[w] += update * X[j][w - 1]; } // we update each weight by the update * the training sample
m_w[0] = update; // We update the Bias term and setting it equal to the update
}
}
}
So that was essentially it. With only 3 functions we now have a working perceptron class that we can use to make predictions!
In case you want to copy-paste the code and try it out. Here is the entire class (I added some extra functionality such as printing the weights vector and the errors in each epoch as well as added the option to import/export weights.)
Here is the code:
The class header:
class perceptron
{
public:
perceptron(float eta,int epochs);
float netInput(vector<float> X);
int predict(vector<float> X);
void fit(vector< vector<float> > X, vector<float> y);
void printErrors();
void exportWeights(string filename);
void importWeights(string filename);
void printWeights();
private:
float m_eta;
int m_epochs;
vector < float > m_w;
vector < float > m_errors;
};
The class .cpp file with the functions:
perceptron::perceptron(float eta, int epochs)
{
m_epochs = epochs;
m_eta = eta;
}
void perceptron::fit(vector< vector<float> > X, vector<float> y)
{
for (int i = 0; i < X[0].size() + 1; i++) // X[0].size() + 1 -> I am using +1 to add the bias term
{
m_w.push_back(0);
}
for (int i = 0; i < m_epochs; i++)
{
int errors = 0;
for (int j = 0; j < X.size(); j++)
{
float update = m_eta * (y[j] - predict(X[j]));
for (int w = 1; w < m_w.size(); w++){ m_w[w] += update * X[j][w - 1]; }
m_w[0] = update;
errors += update != 0 ? 1 : 0;
}
m_errors.push_back(errors);
}
}
float perceptron::netInput(vector<float> X)
{
// Sum(Vector of weights * Input vector) + bias
float probabilities = m_w[0];
for (int i = 0; i < X.size(); i++)
{
probabilities += X[i] * m_w[i + 1];
}
return probabilities;
}
int perceptron::predict(vector<float> X)
{
return netInput(X) > 0 ? 1 : -1; //Step Function
}
void perceptron::printErrors()
{
printVector(m_errors);
}
void perceptron::exportWeights(string filename)
{
ofstream outFile;
outFile.open(filename);
for (int i = 0; i < m_w.size(); i++)
{
outFile << m_w[i] << endl;
}
outFile.close();
}
void perceptron::importWeights(string filename)
{
ifstream inFile;
inFile.open(filename);
for (int i = 0; i < m_w.size(); i++)
{
inFile >> m_w[i];
}
}
void perceptron::printWeights()
{
cout << "weights: ";
for (int i = 0; i < m_w.size(); i++)
{
cout << m_w[i] << " ";
}
cout << endl;
}
Also if you want to try out an example here is an example I made:
main.cpp:
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <string>
#include <math.h>
#include "MachineLearning.h"
using namespace std;
using namespace MachineLearning;
vector< vector<float> > getIrisX();
vector<float> getIrisy();
int main()
{
vector< vector<float> > X = getIrisX();
vector<float> y = getIrisy();
vector<float> test1;
test1.push_back(5.0);
test1.push_back(3.3);
test1.push_back(1.4);
test1.push_back(0.2);
vector<float> test2;
test2.push_back(6.0);
test2.push_back(2.2);
test2.push_back(5.0);
test2.push_back(1.5);
//printVector(X);
//for (int i = 0; i < y.size(); i++){ cout << y[i] << " "; }cout << endl;
perceptron clf(0.1, 14);
clf.fit(X, y);
clf.printErrors();
cout << "Now Predicting: 5.0,3.3,1.4,0.2(CorrectClass=-1,Iris-setosa) -> " << clf.predict(test1) << endl;
cout << "Now Predicting: 6.0,2.2,5.0,1.5(CorrectClass=1,Iris-virginica) -> " << clf.predict(test2) << endl;
system("PAUSE");
return 0;
}
vector<float> getIrisy()
{
vector<float> y;
ifstream inFile;
inFile.open("y.data");
string sampleClass;
for (int i = 0; i < 100; i++)
{
inFile >> sampleClass;
if (sampleClass == "Iris-setosa")
{
y.push_back(-1);
}
else
{
y.push_back(1);
}
}
return y;
}
vector< vector<float> > getIrisX()
{
ifstream af;
ifstream bf;
ifstream cf;
ifstream df;
af.open("a.data");
bf.open("b.data");
cf.open("c.data");
df.open("d.data");
vector< vector<float> > X;
for (int i = 0; i < 100; i++)
{
char scrap;
int scrapN;
af >> scrapN;
bf >> scrapN;
cf >> scrapN;
df >> scrapN;
af >> scrap;
bf >> scrap;
cf >> scrap;
df >> scrap;
float a, b, c, d;
af >> a;
bf >> b;
cf >> c;
df >> d;
X.push_back(vector < float > {a, b, c, d});
}
af.close();
bf.close();
cf.close();
df.close();
return X;
}
The way I imported the iris dataset isn't really ideal but I just wanted something that worked.
The data files can be found here.
I hope that you found this helpful!
Note: The code above is there only as an example. As noted by juzzlin it is important that you use const vector<float> &X and in general pass the vector/vector<vector> objects by reference because the data can be very large and passing it by value will make a copy of it (which is inefficient).

Sounds to me like you are struggling with backprop and what you describe above doesn't quite match how I understand it to work, and your description is a bit ambiguous.
You calculate the output error term to backpropagate as the diffrence between the prediction and the actual value multiplied by the derivative of the transfer function. It is that error value which you then propagate backwards. The derivative of a sigmoid is calculated quite simply as y(1-y) where y is your output value. There are lots of proofs of that available on the web.
For a node on the inner layer, you multiply that output error by the weight between the two nodes, and sum all those products as the total error from the outer layer being propagated to the node in the inner layer. The error associated with the inner node is then multiplied by the derivative of the transfer function applied to the original output value. Here's some pseudocode:
total_error = sum(output_errors * weights)
node_error = sigmoid_derivative(node_output) * total_error
This error is then propagated backwards in the same manner right back through the input layer weights.
The weights are adjusted using these error terms and the output values of the nodes
weight_change = outer_error * inner_output_value
the learning rate is important because the weight change is calculated for every pattern/row/observation in the input data. You want to moderate the weight change for each row so the weights don't get unduly changed by any single row and so that all rows have an effect on the weights. The learning rate gives you that and you adjust the weight change by multiplying by it
weight_change = outer_error * inner_output_value * learning_rate
It is also normal to remember these changes between epochs (iterations) and to add a fraction of it to the change. The fraction added is called momentum and is supposed to speed you up through regions of the error surface where there is not much change and slow you down where there is detail.
weight_change = (outer_error*inner_output_value*learning_rate) + (last_change*momentum)
There are algorithms for adjusting the learning rate and momentum as the training proceeds.
The weight is then updated by adding the change
new_weight = old_weight + weight_change
I had a look through your code, but rather than correct it and post that I thought it was better to describe back prop for you so you can code it up yourself. If you understand it you'll be able to tune it for your circumstances too.
HTH and good luck.

How about this open-source code. It defines a simple 1 hidden layer net (2 input, 2 hidden, 1 output) and solves XOR problem:
https://web.archive.org/web/20131105002125/http://www.sylbarth.com/mlp.php

What about a simple function approximation network like the one that predicts and fits a Sine Function. Also, I think, avoiding class during implementation is a must for getting the basics easily. Let's consider a single hidden layer network.
//Had a lot of trouble with shuffle
#include <iostream>
#include<vector>
#include <list>
#include <cstdlib>
#include <math.h>
#define PI 3.141592653589793238463
#define N
#define epsilon 0.1
#define epoch 2000
using namespace std;
// Just for GNU Plot issues
extern "C" FILE *popen(const char *command, const char *mode);
// Defining activation functions
//double sigmoid(double x) { return 1.0f / (1.0f + exp(-x)); }
//double dsigmoid(double x) { return x * (1.0f - x); }
double tanh(double x) { return (exp(x)-exp(-x))/(exp(x)+exp(-x)) ;}
double dtanh(double x) {return 1.0f - x*x ;}
double lin(double x) { return x;}
double dlin(double x) { return 1.0f;}
double init_weight() { return (2*rand()/RAND_MAX -1); }
double MAXX = -9999999999999999; //maximum value of input example
// Network Configuration
static const int numInputs = 1;
static const int numHiddenNodes = 7;
static const int numOutputs = 1;
// Learning Rate
const double lr = 0.05f;
double hiddenLayer[numHiddenNodes];
double outputLayer[numOutputs];
double hiddenLayerBias[numHiddenNodes];
double outputLayerBias[numOutputs];
double hiddenWeights[numInputs][numHiddenNodes];
double outputWeights[numHiddenNodes][numOutputs];
static const int numTrainingSets = 50;
double training_inputs[numTrainingSets][numInputs];
double training_outputs[numTrainingSets][numOutputs];
// Shuffling the data with each epoch
void shuffle(int *array, size_t n)
{
if (n > 1) //If no. of training examples > 1
{
size_t i;
for (i = 0; i < n - 1; i++)
{
size_t j = i + rand() / (RAND_MAX / (n - i) + 1);
int t = array[j];
array[j] = array[i];
array[i] = t;
}
}
}
// Forward Propagation. Only used after training is done.
void predict(double test_sample[])
{
for (int j=0; j<numHiddenNodes; j++)
{
double activation=hiddenLayerBias[j];
for (int k=0; k<numInputs; k++)
{
activation+=test_sample[k]*hiddenWeights[k][j];
}
hiddenLayer[j] = tanh(activation);
}
for (int j=0; j<numOutputs; j++)
{
double activation=outputLayerBias[j];
for (int k=0; k<numHiddenNodes; k++)
{
activation+=hiddenLayer[k]*outputWeights[k][j];
}
outputLayer[j] = lin(activation);
}
//std::cout<<outputLayer[0]<<"\n";
//return outputLayer[0];
//std::cout << "Input:" << training_inputs[i][0] << " " << training_inputs[i][1] << " Output:" << outputLayer[0] << " Expected Output: " << training_outputs[i][0] << "\n";
}
int main(int argc, const char * argv[])
{
///TRAINING DATA GENERATION
for (int i = 0; i < numTrainingSets; i++)
{
double p = (2*PI*(double)i/numTrainingSets);
training_inputs[i][0] = (p);
training_outputs[i][0] = sin(p);
///FINDING NORMALIZING FACTOR
for(int m=0; m<numInputs; ++m)
if(MAXX < training_inputs[i][m])
MAXX = training_inputs[i][m];
for(int m=0; m<numOutputs; ++m)
if(MAXX < training_outputs[i][m])
MAXX = training_outputs[i][m];
}
///NORMALIZING
for (int i = 0; i < numTrainingSets; i++)
{
for(int m=0; m<numInputs; ++m)
training_inputs[i][m] /= 1.0f*MAXX;
for(int m=0; m<numOutputs; ++m)
training_outputs[i][m] /= 1.0f*MAXX;
cout<<"In: "<<training_inputs[i][0]<<" out: "<<training_outputs[i][0]<<endl;
}
///WEIGHT & BIAS INITIALIZATION
for (int i=0; i<numInputs; i++) {
for (int j=0; j<numHiddenNodes; j++) {
hiddenWeights[i][j] = init_weight();
}
}
for (int i=0; i<numHiddenNodes; i++) {
hiddenLayerBias[i] = init_weight();
for (int j=0; j<numOutputs; j++) {
outputWeights[i][j] = init_weight();
}
}
for (int i=0; i<numOutputs; i++) {
//outputLayerBias[i] = init_weight();
outputLayerBias[i] = 0;
}
///FOR INDEX SHUFFLING
int trainingSetOrder[numTrainingSets];
for(int j=0; j<numInputs; ++j)
trainingSetOrder[j] = j;
///TRAINING
//std::cout<<"start train\n";
vector<double> performance, epo; ///STORE MSE, EPOCH
for (int n=0; n < epoch; n++)
{
double MSE = 0;
shuffle(trainingSetOrder,numTrainingSets);
std::cout<<"epoch :"<<n<<"\n";
for (int i=0; i<numTrainingSets; i++)
{
//int i = trainingSetOrder[x];
int x=i;
//std::cout<<"Training Set :"<<x<<"\n";
/// Forward pass
for (int j=0; j<numHiddenNodes; j++)
{
double activation=hiddenLayerBias[j];
//std::cout<<"Training Set :"<<x<<"\n";
for (int k=0; k<numInputs; k++) {
activation+=training_inputs[x][k]*hiddenWeights[k][j];
}
hiddenLayer[j] = tanh(activation);
}
for (int j=0; j<numOutputs; j++) {
double activation=outputLayerBias[j];
for (int k=0; k<numHiddenNodes; k++)
{
activation+=hiddenLayer[k]*outputWeights[k][j];
}
outputLayer[j] = lin(activation);
}
//std::cout << "Input:" << training_inputs[x][0] << " " << " Output:" << outputLayer[0] << " Expected Output: " << training_outputs[x][0] << "\n";
for(int k=0; k<numOutputs; ++k)
MSE += (1.0f/numOutputs)*pow( training_outputs[x][k] - outputLayer[k], 2);
/// Backprop
/// For V
double deltaOutput[numOutputs];
for (int j=0; j<numOutputs; j++) {
double errorOutput = (training_outputs[i][j]-outputLayer[j]);
deltaOutput[j] = errorOutput*dlin(outputLayer[j]);
}
/// For W
double deltaHidden[numHiddenNodes];
for (int j=0; j<numHiddenNodes; j++) {
double errorHidden = 0.0f;
for(int k=0; k<numOutputs; k++) {
errorHidden+=deltaOutput[k]*outputWeights[j][k];
}
deltaHidden[j] = errorHidden*dtanh(hiddenLayer[j]);
}
///Updation
/// For V and b
for (int j=0; j<numOutputs; j++) {
//b
outputLayerBias[j] += deltaOutput[j]*lr;
for (int k=0; k<numHiddenNodes; k++)
{
outputWeights[k][j]+= hiddenLayer[k]*deltaOutput[j]*lr;
}
}
/// For W and c
for (int j=0; j<numHiddenNodes; j++) {
//c
hiddenLayerBias[j] += deltaHidden[j]*lr;
//W
for(int k=0; k<numInputs; k++) {
hiddenWeights[k][j]+=training_inputs[i][k]*deltaHidden[j]*lr;
}
}
}
//Averaging the MSE
MSE /= 1.0f*numTrainingSets;
//cout<< " MSE: "<< MSE<<endl;
///Steps to PLOT PERFORMANCE PER EPOCH
performance.push_back(MSE*100);
epo.push_back(n);
}
// Print weights
std::cout << "Final Hidden Weights\n[ ";
for (int j=0; j<numHiddenNodes; j++) {
std::cout << "[ ";
for(int k=0; k<numInputs; k++) {
std::cout << hiddenWeights[k][j] << " ";
}
std::cout << "] ";
}
std::cout << "]\n";
std::cout << "Final Hidden Biases\n[ ";
for (int j=0; j<numHiddenNodes; j++) {
std::cout << hiddenLayerBias[j] << " ";
}
std::cout << "]\n";
std::cout << "Final Output Weights";
for (int j=0; j<numOutputs; j++) {
std::cout << "[ ";
for (int k=0; k<numHiddenNodes; k++) {
std::cout << outputWeights[k][j] << " ";
}
std::cout << "]\n";
}
std::cout << "Final Output Biases\n[ ";
for (int j=0; j<numOutputs; j++) {
std::cout << outputLayerBias[j] << " ";
}
std::cout << "]\n";
/* This part is just for plotting the results.
This requires installing GNU Plot. You can also comment it out.
*/
//Plot the results
vector<float> x;
vector<float> y1, y2;
//double test_input[1000][numInputs];
int numTestSets = numTrainingSets;
for (float i = 0; i < numTestSets; i=i+0.25)
{
double p = (2*PI*(double)i/numTestSets);
x.push_back(p);
y1.push_back(sin(p));
double test_input[1];
test_input[0] = p/MAXX;
predict(test_input);
y2.push_back(outputLayer[0]*MAXX);
}
FILE * gp = popen("gnuplot", "w");
fprintf(gp, "set terminal wxt size 600,400 \n");
fprintf(gp, "set grid \n");
fprintf(gp, "set title '%s' \n", "f(x) = x sin (x)");
fprintf(gp, "set style line 1 lt 3 pt 7 ps 0.1 lc rgb 'green' lw 1 \n");
fprintf(gp, "set style line 2 lt 3 pt 7 ps 0.1 lc rgb 'red' lw 1 \n");
fprintf(gp, "plot '-' w p ls 1, '-' w p ls 2 \n");
///Exact f(x) = sin(x) -> Green Graph
for (int k = 0; k < x.size(); k++) {
fprintf(gp, "%f %f \n", x[k], y1[k]);
}
fprintf(gp, "e\n");
///Neural Network Approximate f(x) = xsin(x) -> Red Graph
for (int k = 0; k < x.size(); k++) {
fprintf(gp, "%f %f \n", x[k], y2[k]);
}
fprintf(gp, "e\n");
fflush(gp);
///FILE POINTER FOR SECOND PLOT (PERFORMANCE GRAPH)
FILE * gp1 = popen("gnuplot", "w");
fprintf(gp1, "set terminal wxt size 600,400 \n");
fprintf(gp1, "set grid \n");
fprintf(gp1, "set title '%s' \n", "Performance");
fprintf(gp1, "set style line 1 lt 3 pt 7 ps 0.1 lc rgb 'green' lw 1 \n");
fprintf(gp1, "set style line 2 lt 3 pt 7 ps 0.1 lc rgb 'red' lw 1 \n");
fprintf(gp1, "plot '-' w p ls 1 \n");
for (int k = 0; k < epo.size(); k++) {
fprintf(gp1, "%f %f \n", epo[k], performance[k]);
}
fprintf(gp1, "e\n");
fflush(gp1);
system("pause");
//_pclose(gp);
return 0;
}
I too have been trying to learn simple (shallow) Neural Networks while avoiding any high level tools. I have tried to maintain some of my learning at this repository.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js