Neural Network Overestimating output of handwritten digits

Neural Network Overestimating output of handwritten digits - c++

So, I try to create my own neural network. Something really simple.
My input is the MNIST database of handwritten digits.
Input: 28*28 neurons (Images).
Output: 10 neurons (0/1/2/3/4/5/6/7/8/9).
So my network is as follow: 28*28 -> 15 -> 10.
The problem remains in my estimated output. Indeed, it seems I have a gradient explosion.
The output given by my network is here: https://pastebin.com/EFpBGAZd
As you can see, the first estimated output is wrong. So my network adjust the weights thanks to the backpropagation. But It doesn't seems to updates the weights correctly. Indeed the estimated output is too high compared to the second highest value.
So the first estimated output keeps being the best estimated output for the following training (13 in my example).
My backpropagation code:
VOID BP(NETWORK &Network, double Target[OUTPUT_NEURONS]) {
double DeltaETotalOut = 0;
double DeltaOutNet = 0;
double DeltaErrorNet = 0;
double DeltaETotalWeight = 0;
double Error = 0;
double ErrorTotal = 0;
double OutputUpdatedWeights[OUTPUT_NEURONS*HIDDEN_NEURONS] = { 0 };
unsigned int _indexOutput = 0;
double fNetworkError = 0;
//Calculate Error
for (int i = 0; i < OUTPUT_NEURONS; i++) {
fNetworkError += 0.5*pow(Target[i] - Network.OLayer.Cell[i].Output, 2);
}
Network.Error = fNetworkError;
//Output Neurons
for (int i = 0; i < OUTPUT_NEURONS; i++) {
DeltaETotalOut = -(Target[i] - Network.OLayer.Cell[i].Output);
DeltaOutNet = ActivateSigmoidPrime(Network.OLayer.Cell[i].Output);
for (int j = 0; j < HIDDEN_NEURONS; j++) {
OutputUpdatedWeights[_indexOutput] = Network.OLayer.Cell[i].Weight[j] - 0.5 * DeltaOutNet*DeltaETotalOut* Network.HLayer.Cell[j].Output;
_indexOutput++;
}
}
//Hidden Neurons
for (int i = 0; i < HIDDEN_NEURONS; i++) {
ErrorTotal = 0;
for (int k = 0; k < OUTPUT_NEURONS; k++) {
DeltaETotalOut = -(Target[k] - Network.OLayer.Cell[k].Output);
DeltaOutNet = ActivateSigmoidPrime(Network.OLayer.Cell[k].Output);
DeltaErrorNet = DeltaETotalOut * DeltaOutNet;
Error = DeltaErrorNet * Network.OLayer.Cell[k].Weight[i];
ErrorTotal += Error;
}
DeltaOutNet = ActivateSigmoidPrime(Network.HLayer.Cell[i].Output);
for (int j = 0; j < INPUT_NEURONS; j++) {
DeltaETotalWeight = ErrorTotal * DeltaOutNet*Network.ILayer.Image[j];
Network.HLayer.Cell[i].Weight[j] -= 0.5 * DeltaETotalWeight;
}
}
//Update Weights
_indexOutput = 0;
for (int i = 0; i < OUTPUT_NEURONS; i++) {
for (int j = 0; j < HIDDEN_NEURONS; j++) {
Network.OLayer.Cell[i].Weight[j] = OutputUpdatedWeights[_indexOutput];
_indexOutput++;
}
}}
How can I solve this issue?
I didn't worked on the hidden layer nor biases, is it due to it?
Thanks

Well, since Backpropagation is notoriously hard to implement and especially to debug (I guess everyone who did it can relate) it’s much harder to debug some Code written by others.
After a quick view over your code, I’m quite surprised that you calculate a negative delta term? Are you using ReLU or any sigmoid function? I’m quite sure there is more. But I’d suggest you to stay away from MNIST until you got your network to solve XOR.
I’ve wrote a summary in pseudo code on how to implement Backpropagation in pseudo code. I’m sure you’ll be able to translate it into C++ quite easily.
Strange convergence in simple Neural Network

In my experience neural networks should really be implemented with matrix operations. This will make your code faster and easier to debug.
The way to debug backpropagation is to use finite difference. For a loss function J(theta) we can approximate the gradient in each dimension with (J(theta + epsilon*d) - J(theta))/epsilon with d a one-hot vector representing one dimension (note the similarity to a derivative).
https://en.wikipedia.org/wiki/Finite_difference_method

Related

Multi-threading molecular simulations in C++

I am developing a molecular dynamics simulation code in C++, which essentially takes atom positions and other properties as input and simulates their motion under Newton's laws of motion. The core algorithm uses what's called the Velocity Verlet scheme and looks like:
// iterate through time (k=[1,#steps])
double Dt = 0.002; // time step
double Ttot = 1.0; // total time
double halfDt = Dt/2.0;
for (int k = 1; k*Dt <= Ttot; k++){
for (int i = 0; i < number_particles; i++)
vHalf[i] = p[i].velocity + F[i]*halfDt; // step 1
for (int i = 0; i < number_particles; i++)
p[i].position += vHalf[i]*Dt; // step 2
for (int i = 0; i < number_particles; i++)
F[i] = Force(p,i); // recalculate force on all particle i's
for (int i = 0; i < number_particles; i++)
p[i].velocity = vHalf[i] + F[i]*halfDt; // step 3
}
Where p is an array of class objects which store things like particle position, velocity, mass, etc. and Force is a function that calculates the net force on a particle using something like Lennard-Jones potential.
My question regards the time required to complete the calculation; all of my subroutines are optimized in terms of crunching numbers (e.g. using x*x*x to raise to the third power instead of pow(x,3)), but the main issue is the time loop will often be performed for millions of iterations and there are typically close to a million particles. Is there any way to implement this algorithm using multi-threading? From my understanding, multi-threading essentially opens another stream of data to and from a CPU core, which would allow me to run two different simulations at the same time; I would like to use multi-threading to make just one of these simulations run faster

I'd recommend using OpenMP.
Your specific use case is trivially parallelizable.
Prallelization should be as simple as:
double Dt = 0.002; // time step
double Ttot = 1.0; // total time
double halfDt = Dt/2.0;
for (int k = 1; k*Dt <= Ttot; k++){
#pragma omp parallel for
for (int i = 0; i < number_particles; i++)
vHalf[i] = p[i].velocity + F[i]*halfDt; // step 1
p[i].position += vHalf[i]*Dt; // step 2
#pragma omp parallel for
for (int i = 0; i < number_particles; i++)
F[i] = Force(p,i); // recalculate force on all particle i's
p[i].velocity = vHalf[i] + F[i]*halfDt; // step 3
}
Most popular compilers and platforms have support for OpenMP.

Is such a normalization right for wiggly curves?

I am training a neural network (in C++, without any additional library), to learn a random wiggly function:
f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)
Plotted in Python as:
lim = 500
for i in range(lim):
x.append(i)
p = 2*3.14*i/lim
y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))
plt.plot(x,y)
that corresponds to a curve as :
The same neural network has successfully approximated the sine function quite well with a single hidden layer(5 neurons), tanh activation. But, I am unable to understand what's going wrong with the wiggly function. Although the Mean Square Error seems to dip.(**The error has been scaled up by 100 for visibility):
And this is the expected (GREEN) vs predicted (RED) graph.
I doubt the normalization. This is how I did it:
Generated training data as:
int numTrainingSets = 100;
double MAXX = -9999999999999999;
for (int i = 0; i < numTrainingSets; i++)
{
double p = (2*PI*(double)i/numTrainingSets);
training_inputs[i][0] = p; //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output
///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
for(int m=0; m<numInputs; ++m)
if(MAXX < training_inputs[i][m])
MAXX = training_inputs[i][m]; //FINDING MAXIMUM VALUE IN INPUT DATA
for(int m=0; m<numOutputs; ++m)
if(MAXX < training_outputs[i][m])
MAXX = training_outputs[i][m]; //FINDING MAXIMUM VALUE IN OUTPUT DATA
///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE
////DO THIS FOR INPUT TRAINING DATA
for(int m=0; m<numInputs; ++m)
training_inputs[i][m] /= MAXX;
////DO THIS FOR OUTPUT TRAINING DATA
for(int m=0; m<numOutputs; ++m)
training_outputs[i][m] /= MAXX;
}
This is what the model trains on. The validation/test data is generated as follows:
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
y1.push_back(res); //FORMS THE GREEN CURVE FOR PLOTTING
///Predicted Value
double temp[1];
temp[0] = p;
y2.push_back(MAXX*predict(temp)); //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize
}
Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?

There's nothing wrong with using that normalization, unless you use a fancy weight initialization for the neural network. It rather seems that something goes wrong during training but without further details on that side, it's hard to pinpoint the problem.
I ran a quick crosscheck using tensorflow (MSE loss; Adam optimizer) and it does converge in that case:
Here's the code for reference:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)
class Model(tf.keras.Model):
def __init__(self):
super().__init__()
self.h1 = tf.keras.layers.Dense(5, activation='tanh')
self.out = tf.keras.layers.Dense(1, activation=None)
def call(self, x):
return self.out(self.h1(x))
model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()
#tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
loss = loss_object(y, model(x))
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)
loss_history = []
for epoch in range(5000):
for train_x, train_y in train_ds:
train_step(train_x, train_y)
loss_history.append(train_loss.result())
print(f'Epoch {epoch}, loss: {loss_history[-1]}')
train_loss.reset_states()
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)
plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()

I found the case to be not so regular and this was the mistake:
1) I was finding the normalizing factor correctly, but had to change this:
for (int i = 0; i < numTrainingSets; i++)
{
//Find and update Normalization factor(as shown in the question)
//Normalize the training example
}
to
for (int i = 0; i < numTrainingSets; i++)
{
//Find Normalization factor (as shown in the question)
}
for (int i = 0; i < numTrainingSets; i++)
{
//Normalize the training example
}
Also, the validation set was earlier generated as :
int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
//Generate data
double p = (2*PI*i/numTestSets)/MAXX;
//And other steps...
}
whereas the Training data was generated on numTrainingSets = 100. Hence, p generated for training set and the one generated for validation set lies in different range. So, I had to make ** numTestSets = numTrainSets**.
Lastly,
Is this normalizing right?
I had been wrongly normalizing the actual result too!
As shown in the question:
double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p); //FORMS THE X-AXIS FOR PLOTTING
///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
Notice: the p used to generate this actual result has been normalized (unnecessarily).
This is the final result after resolving these issues...

FFT Spectrum not displaying correctly

I'm currently trying to display an audio spectrum using FFTW3 and SFML. I've followed the directions found here and looked at numerous references on FFT and spectrums and FFTW yet somehow my bars are almost all aligned to the left like below. Another issue I'm having is I can't find information on what the scale of the FFT output is. Currently I'm dividing it by 64 yet it still reaches beyond that occasionally. And further still I have found no information on why the output of the from FFTW has to be the same size as the input. So my questions are:
Why is the majority of my spectrum aligned to the left unlike the image below mine?
Why isn't the output between 0.0 and 1.0?
Why is the input sample count related to the fft output count?
What I get:
What I'm looking for:
const int bufferSize = 256 * 8;
void init() {
sampleCount = (int)buffer.getSampleCount();
channelCount = (int)buffer.getChannelCount();
for (int i = 0; i < bufferSize; i++) {
window.push_back(0.54f - 0.46f * cos(2.0f * GMath::PI * (float)i / (float)bufferSize));
}
plan = fftwf_plan_dft_1d(bufferSize, signal, results, FFTW_FORWARD, FFTW_ESTIMATE);
}
void update() {
int mark = (int)(sound.getPlayingOffset().asSeconds() * sampleRate);
for (int i = 0; i < bufferSize; i++) {
float s = 0.0f;
if (i + mark < sampleCount) {
s = (float)buffer.getSamples()[(i + mark) * channelCount] / (float)SHRT_MAX * window[i];
}
signal[i][0] = s;
signal[i][1] = 0.0f;
}
}
void draw() {
int inc = bufferSize / 2 / size.x;
int y = size.y - 1;
int max = size.y;
for (int i = 0; i < size.x; i ++) {
float total = 0.0f;
for (int j = 0; j < inc; j++) {
int index = i * inc + j;
total += std::sqrt(results[index][0] * results[index][0] + results[index][1] * results[index][1]);
}
total /= (float)(inc * 64);
Rectangle2I rect = Rectangle2I(i, y, 1, -(int)(total * max)).absRect();
g->setPixel(rect, Pixel(254, toColor(BLACK, GREEN)));
}
}

All of your questions are related to the FFT theory. Study the properties of FFT from any standard text/reference book and you will be able to answer your questions all by yourself only.
The least you can start from is here:
https://en.wikipedia.org/wiki/Fast_Fourier_transform.

Many FFT implementations are energy preserving. That means the scale of the output is linearly related to the scale and/or size of the input.
An FFT is a DFT is a square matrix transform. So the number of outputs will always be equal to the number of inputs (or half that by ignoring the redundant complex conjugate half given strictly real input), unless some outputs are thrown away. If not, it's not an FFT. If you want less outputs, there are ways to downsample the FFT output or post process it in other ways.

C++ OpenCV: What is the easiest way to apply 2-D convolution

I have a Kernel filter that I generated and I want to apply it to my image but I could not get a right result by doing this:
Actually I can use a different method as well since I am not to familiar with opencv I need help thanks.
channel[c] is the read image;
int size = 5; // Gaussian filter box side size
double gauss[5][5];
int sidestp = (size - 1) / 2;
// I have a function to generate the gaussiankernel filter
float sum = 0;
for (int x = 1; x < channels[c].cols - 1; x++){
for (int y = 1; y < channels[c].rows - 1; y++){
for (int i = -size; i <= size; i++){
for (int j = -sidestp; j <= sidestp; j++){
sum = sum + gauss[i + sidestp][j + sidestp] * channels[c].at<uchar>(x - i, y - j);
}
}
result.at<uchar>(y, x) = sum;
}
}

OpenCV has an inbuilt function filter2D that does this convolution for you.
You need to provide your source and destination images, along with the custom kernel (as a Mat), and a few more arguments. See this if it still bothers you.

Just to add to the previous answer, since you are performing Gaussian blur, you can use the OpenCV GaussianBlur (Check here). Unlike filter2D, you can use the standard deviations as input parameter.

Particle Deposition Terrain Generation

I'm using Particle Deposition to try and create some volcano-like mountains procedurally but all I'm getting out of it is pyramid-like structures. Is anyone familiar with the algorithm that might be able to shed some light on what I might be doing wrong. I'm dropping each particle in the same place at the moment. If I don't they spread out in a very thin layer rather than any sort of mountain.
void TerrainClass::ParticalDeposition(int loops){
float height = 0.0;
//for(int k= 0; k <10; k++){
int dropX = mCurrentX = rand()%(m_terrainWidth-80) + 40;
int dropY = mCurrentZ = rand()%(m_terrainHeight-80) + 40;
int radius = 15;
float angle = 0;
int tempthing = 0;
loops = 360;
for(int i = 0; i < loops; i++){
mCurrentX = dropX + radius * cos(angle);
mCurrentZ = dropY + radius * sin(angle);
/*f(i%loops/5 == 0){
dropX -= radius * cos(angle);
dropY += radius * sin(angle);
angle+= 0.005;
mCurrentX = dropX;
mCurrentZ = dropY;
}*/
angle += 360/loops;
//dropX += rand()%5;
//dropY += rand()%5;
//for(int j = 0; j < loops; j++){
float newY = 0;
newY = (1 - (2.0f/loops)*i);
if(newY < 0.0f){
newY = 0.0f;
}
DepositParticle(newY);
//}
}
//}
}
void TerrainClass::DepositParticle(float heightIncrease){
bool posFound = false;
m_lowerList.clear();
while(posFound == false){
int offset = 10;
int jitter;
if(Stable(0.5f)){
m_heightMap[(m_terrainHeight*mCurrentZ)+mCurrentX].y += heightIncrease;
posFound = true;
}else{
if(!m_lowerList.empty()){
int element = rand()%m_lowerList.size();
int lowerIndex = m_lowerList.at(element);
MoveTo(lowerIndex);
}
}
}
}
bool TerrainClass::Stable(float deltaHeight){
int index[9];
float height[9];
index[0] = ((m_terrainHeight*mCurrentZ)+mCurrentX); //the current index
index[1] = ValidIndex((m_terrainHeight*mCurrentZ)+mCurrentX+1) ? (m_terrainHeight*mCurrentZ)+mCurrentX+1 : -1; // if the index to the right is valid index set index[] to index else set index[] to -1
index[2] = ValidIndex((m_terrainHeight*mCurrentZ)+mCurrentX-1) ? (m_terrainHeight*mCurrentZ)+mCurrentX-1 : -1; //to the left
index[3] = ValidIndex((m_terrainHeight*(mCurrentZ+1))+mCurrentX) ? (m_terrainHeight*(mCurrentZ+1))+mCurrentX : -1; // above
index[4] = ValidIndex((m_terrainHeight*(mCurrentZ-1))+mCurrentX) ? (m_terrainHeight*(mCurrentZ-1))+mCurrentX : -1; // bellow
index[5] = ValidIndex((m_terrainHeight*(mCurrentZ+1))+mCurrentX+1) ? (m_terrainHeight*(mCurrentZ+1))+mCurrentX+1: -1; // above to the right
index[6] = ValidIndex((m_terrainHeight*(mCurrentZ-1))+mCurrentX+1) ? (m_terrainHeight*(mCurrentZ-1))+mCurrentX+1: -1; // below to the right
index[7] = ValidIndex((m_terrainHeight*(mCurrentZ+1))+mCurrentX-1) ? (m_terrainHeight*(mCurrentZ+1))+mCurrentX-1: -1; // above to the left
index[8] = ValidIndex((m_terrainHeight*(mCurrentZ-1))+mCurrentX-1) ? (m_terrainHeight*(mCurrentZ-1))+mCurrentX-1: -1; // above to the right
for ( int i = 0; i < 9; i++){
height[i] = (index[i] != -1) ? m_heightMap[index[i]].y : -1;
}
m_lowerList.clear();
for(int i = 1; i < 9; i++){
if(height[i] != -1){
if(height[i] < height[0] - deltaHeight){
m_lowerList.push_back(index[i]);
}
}
}
return m_lowerList.empty();
}
bool TerrainClass::ValidIndex(int index){
return (index > 0 && index < m_terrainWidth*m_terrainHeight) ? true : false;
}
void TerrainClass::MoveTo(int index){
mCurrentX = index%m_terrainWidth;
mCurrentZ = index/m_terrainHeight;
}
Thats all the code thats used.

You should have a look at these two papers:
Fast Hydraulic Erosion Simulation and Visualization on GPU
Fast Hydraulic and Thermal Erosion on the GPU (read the first one first, the second one expands on it)
Don't get scared by the "on GPU", the algorithms work just fine on CPU (albeit slower). The algorithms don't do particle sedimentation per se (but you don't either ;) ) - they instead aggregate the particles into several layers of vector fields.
One important thing about this algorithm is that it erodes already existing heightmaps - for example generated with perlin noise. It fails miserably if the initial height field is completely flat (or even if it has insufficient height variation).
I had implemented this algorithm myself and had mostly success with it (still have more work to do, the algorithms are very hard to balance to give universally great results) - see the image below.
Note that perlin noise with the Thermal weathering component from the second paper may be well enough for you (and might save you a lot of trouble).
You can also find C++ CPU-based implementation of this algorithm in my project (specifically this file, mind the GPL license!) and its simplified description on pages 24-29 of my thesis.

Your particles will need to have some surface friction and/or stickiness (or similar) in their physics model if you want them to not spread out into a single-layer. This is performed in the collision detection and collision response parts of your code when updating your particle simulation.
A simple approach is to make the particles stick (attract each-other). Particles need to have a size too so that they don't simply converge to perfectly overlapping. If you want to make them attract each other, then you need to test the distance between particles.
You might benefit from looking through some of the DirectX SDK examples that use particles, and in particular (pun arf!) there is a great demo (by Simon Green?) in the NVidia GPU Computing SDK that implements sticky particles in CUDA. It includes a ReadMe document describing what they've done. You can see how the particles interact and ignore all the CUDA/GPU stuff if you aren't going for massive particle counts.
Also note that as soon as you use inter-particle forces, then you are checking approximately 0.5*n^2 combinations (pairs) of particles...so you may need to use a simple spatial partitioning scheme or similar to limit forces to nearby groups of particles only.
Good luck!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Neural Network Overestimating output of handwritten digits - c++

Related

Multi-threading molecular simulations in C++

Is such a normalization right for wiggly curves?

FFT Spectrum not displaying correctly

C++ OpenCV: What is the easiest way to apply 2-D convolution

Particle Deposition Terrain Generation

Categories

Resources