Neural Network Accuracy Plateauing

Neural Network Accuracy Plateauing - c++

I am making a neural network for the purpose of identifying letters. Currently, during training, the network seems to plateau at around 12% accuracy. As input, the network takes a 10x10 image (formatted as a 100x1 column vector) and outputs a 26x1 column vector where each element corresponds to a different letter. Right now I don't have a great data set (only 50 samples) but I iterate over it a few hundred times, and each iteration the accuracy doesn't really get any better than 6 / 50 correct. What I consider a correct identification is the element that corresponds to the correct letter being the greatest number in the vector. I was hoping to get a decently good accuracy before moving on and expanding the data set.
ML::Matrix ML::NeuralNetwork::calculate(const Matrix & input)
{
//all inputs and layers are column vectors
//weights and biases are std::vector of ML::Matrix
Matrix resultant = input;
results.add(resultant); //circular linked list to store the intermediate results
for (int i = 0; i < weights.size(); ++i) {
resultant = (weights[i] * resultant) + biases[i];
resultant.function(sigmoid); //apply sigmoid to every element in the matrix
results.add(resultant);
}
return resultant;
}
void ML::NeuralNetwork::learn(const Matrix & calc, const Matrix & real)
{
//backpropagation
ML::Matrix cost = 2 * (calc - real); //derivative of cost function: (calc - real)^2
for (int i = weights.size() - 1; i >= 0; --i) {
ML::Matrix dCdB = cost.hadamardProduct(ML::sigDerivative(weights[i] * results[i] + biases[i]));
ML::Matrix dCdW = dCdB * results[i].transpose();
cost = weights[i].transpose() * dCdB;
weights[i] -= learningRate * dCdW;
biases[i] -= learningRate * dCdB;
}
}
ML::Matrix ML::Matrix::operator*(const Matrix & other) const throw(ML::MathUndefinedException)
{
//naive matrix-multiplication and matrix-vector product
if (columns != other.rows) throw MathUndefinedException();
Matrix output(rows, other.columns);
if (other.columns == 1) {
for (int i = 0; i < rows; ++i) {
for (int j = 0; j < columns; ++j)
output.set(i, output.get(i) + get(i, j) * other.get(j));
}
}
else {
for (int i = 0; i < rows; ++i) {
for (int j = 0; j < columns; ++j) {
for (int k = 0; k < other.rows; ++k) {
output.set(i, j, output.get(i, j) + get(i, k) * other.get(k, j));
}
}
}
}
return output;
}
My network does work better with simpler examples. In a test with 3 inputs and 1 output it plateaus at about 70% and in another test with only 1 input and 1 output it would get around 99% accuracy so I am not certain if there is a problem with the code. While the code is abstracted for n layers of any size, I have been testing with around 1 - 2 hidden layers (total of 3 - 4 layers). I have tested various training rates, even non constant and differential training rates. I have tested each individual matrix manipulation function on its own (hadamardProduct, transposing, matrix addition etc.) so I am almost certain the problem isn't in one of those functions (thus, I didn't show their code with the exception of matrix multiplication)
All help will be appreciated

Related

Algorithm on hexagonal grid

Hexagonal grid is represented by a two-dimensional array with R rows and C columns. First row always comes "before" second in hexagonal grid construction (see image below). Let k be the number of turns. Each turn, an element of the grid is 1 if and only if the number of neighbours of that element that were 1 the turn before is an odd number. Write C++ code that outputs the grid after k turns.
Limitations:
1 <= R <= 10, 1 <= C <= 10, 1 <= k <= 2^(63) - 1
An example with input (in the first row are R, C and k, then comes the starting grid):
4 4 3
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
Simulation: image, yellow elements represent '1' and blank represent '0'.
This problem is easy to solve if I simulate and produce a grid each turn, but with big enough k it becomes too slow. What is the faster solution?
EDIT: code (n and m are used instead R and C) :
#include <cstdio>
#include <cstring>
using namespace std;
int old[11][11];
int _new[11][11];
int n, m;
long long int k;
int main() {
scanf ("%d %d %lld", &n, &m, &k);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) scanf ("%d", &old[i][j]);
}
printf ("\n");
while (k) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int count = 0;
if (i % 2 == 0) {
if (i) {
if (j) count += old[i-1][j-1];
count += old[i-1][j];
}
if (j) count += (old[i][j-1]);
if (j < m-1) count += (old[i][j+1]);
if (i < n-1) {
if (j) count += old[i+1][j-1];
count += old[i+1][j];
}
}
else {
if (i) {
if (j < m-1) count += old[i-1][j+1];
count += old[i-1][j];
}
if (j) count += old[i][j-1];
if (j < m-1) count += old[i][j+1];
if (i < n-1) {
if (j < m-1) count += old[i+1][j+1];
count += old[i+1][j];
}
}
if (count % 2) _new[i][j] = 1;
else _new[i][j] = 0;
}
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) old[i][j] = _new[i][j];
}
k--;
}
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
printf ("%d", old[i][j]);
}
printf ("\n");
}
return 0;
}

For a given R and C, you have N=R*C cells.
If you represent those cells as a vector of elements in GF(2), i.e, 0s and 1s where arithmetic is performed mod 2 (addition is XOR and multiplication is AND), then the transformation from one turn to the next can be represented by an N*N matrix M, so that:
turn[i+1] = M*turn[i]
You can exponentiate the matrix to determine how the cells transform over k turns:
turn[i+k] = (M^k)*turn[i]
Even if k is very large, like 2^63-1, you can calculate M^k quickly using exponentiation by squaring: https://en.wikipedia.org/wiki/Exponentiation_by_squaring This only takes O(log(k)) matrix multiplications.
Then you can multiply your initial state by the matrix to get the output state.
From the limits on R, C, k, and time given in your question, it's clear that this is the solution you're supposed to come up with.

There are several ways to speed up your algorithm.
You do the neighbour-calculation with the out-of bounds checking in every turn. Do some preprocessing and calculate the neighbours of each cell once at the beginning. (Aziuth has already proposed that.)
Then you don't need to count the neighbours of all cells. Each cell is on if an odd number of neighbouring cells were on in the last turn and it is off otherwise.
You can think of this differently: Start with a clean board. For each active cell of the previous move, toggle the state of all surrounding cells. When an even number of neighbours cause a toggle, the cell is on, otherwise the toggles cancel each other out. Look at the first step of your example. It's like playing Lights Out, really.
This method is faster than counting the neighbours if the board has only few active cells and its worst case is a board whose cells are all on, in which case it is as good as neighbour-counting, because you have to touch each neighbours for each cell.
The next logical step is to represent the board as a sequence of bits, because bits already have a natural way of toggling, the exclusive or or xor oerator, ^. If you keep the list of neigbours for each cell as a bit mask m, you can then toggle the board b via b ^= m.
These are the improvements that can be made to the algorithm. The big improvement is to notice that the patterns will eventually repeat. (The toggling bears resemblance with Conway's Game of Life, where there are also repeating patterns.) Also, the given maximum number of possible iterations, 2⁶³ is suspiciously large.
The playing board is small. The example in your question will repeat at least after 2¹⁶ turns, because the 4×4 board can have at most 2¹⁶ layouts. In practice, turn 127 reaches the ring pattern of the first move after the original and it loops with a period of 126 from then.
The bigger boards may have up to 2¹⁰⁰ layouts, so they may not repeat within 2⁶³ turns. A 10×10 board with a single active cell near the middle has ar period of 2,162,622. This may indeed be a topic for a maths study, as Aziuth suggests, but we'll tacke it with profane means: Keep a hash map of all previous states and the turns where they occurred, then check whether the pattern has occurred before in each turn.
We now have:
a simple algorithm for toggling the cells' state and
a compact bitwise representation of the board, which allows us to create a hash map of the previous states.
Here's my attempt:
#include <iostream>
#include <map>
/*
* Bit representation of a playing board, at most 10 x 10
*/
struct Grid {
unsigned char data[16];
Grid() : data() {
}
void add(size_t i, size_t j) {
size_t k = 10 * i + j;
data[k / 8] |= 1u << (k % 8);
}
void flip(const Grid &mask) {
size_t n = 13;
while (n--) data[n] ^= mask.data[n];
}
bool ison(size_t i, size_t j) const {
size_t k = 10 * i + j;
return ((data[k / 8] & (1u << (k % 8))) != 0);
}
bool operator<(const Grid &other) const {
size_t n = 13;
while (n--) {
if (data[n] > other.data[n]) return true;
if (data[n] < other.data[n]) return false;
}
return false;
}
void dump(size_t n, size_t m) const {
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
std::cout << '\n';
}
};
int main()
{
size_t n, m, k;
std::cin >> n >> m >> k;
Grid grid;
Grid mask[10][10];
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
int x;
std::cin >> x;
if (x) grid.add(i, j);
}
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
Grid &mm = mask[i][j];
if (i % 2 == 0) {
if (i) {
if (j) mm.add(i - 1, j - 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j) mm.add(i + 1, j - 1);
mm.add(i + 1, j);
}
} else {
if (i) {
if (j < m - 1) mm.add(i - 1, j + 1);
mm.add(i - 1, j);
}
if (j) mm.add(i, j - 1);
if (j < m - 1) mm.add(i, j + 1);
if (i < n - 1) {
if (j < m - 1) mm.add(i + 1, j + 1);
mm.add(i + 1, j);
}
}
}
}
std::map<Grid, size_t> prev;
std::map<size_t, Grid> pattern;
for (size_t turn = 0; turn < k; turn++) {
Grid next;
std::map<Grid, size_t>::const_iterator it = prev.find(grid);
if (1 && it != prev.end()) {
size_t start = it->second;
size_t period = turn - start;
size_t index = (k - turn) % period;
grid = pattern[start + index];
break;
}
prev[grid] = turn;
pattern[turn] = grid;
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
if (grid.ison(i, j)) next.flip(mask[i][j]);
}
}
grid = next;
}
for (size_t i = 0; i < n; i++) {
for (size_t j = 0; j < m; j++) {
std::cout << (grid.ison(i, j) ? 1 : 0);
}
std::cout << '\n';
}
return 0;
}
There is probably room for improvement. Especially, I'm not so sure how it fares for big boards. (The code above uses an ordered map. We don't need the order, so using an unordered map will yield faster code. The example above with a single active cell on a 10×10 board took significantly longer than a second with an ordered map.)

Not sure about how you did it - and you should really always post code here - but let's try to optimize things here.
First of all, there is not really a difference between that and a quadratic grid. Different neighbor relationships, but I mean, that is just a small translation function. If you have a problem there, we should treat this separately, maybe on CodeReview.
Now, the naive solution is:
for all fields
count neighbors
if odd: add a marker to update to one, else to zero
for all fields
update all fields by marker of former step
this is obviously in O(N). Iterating twice is somewhat twice the actual run time, but should not be that bad. Try not to allocate space every time that you do that but reuse existing structures.
I'd propose this solution:
at the start:
create a std::vector or std::list "activated" of pointers to all fields that are activated
each iteration:
create a vector "new_activated"
for all items in activated
count neighbors, if odd add to new_activated
for all items in activated
set to inactive
replace activated by new_activated*
for all items in activated
set to active
*this can be done efficiently by putting them in a smart pointer and use move semantics
This code only works on the activated fields. As long as they stay within some smaller area, this is far more efficient. However, I have no idea when this changes - if there are activated fields all over the place, this might be less efficient. In that case, the naive solution might be the best one.
EDIT: after you now posted your code... your code is quite procedural. This is C++, use classes and use representation of things. Probably you do the search for neighbors right, but you can easily make mistakes there and therefore should isolate that part in a function, or better method. Raw arrays are bad and variables like n or k are bad. But before I start tearing your code apart, I instead repeat my recommendation, put the code on CodeReview, having people tear it apart until it is perfect.

This started off as a comment, but I think it could be helpful as an answer in addition to what has already been stated.
You stated the following limitations:
1 <= R <= 10, 1 <= C <= 10
Given these restrictions, I'll take the liberty to can represent the grid/matrix M of R rows and C columns in constant space (i.e. O(1)), and also check its elements in O(1) instead of O(R*C) time, thus removing this part from our time-complexity analysis.
That is, the grid can simply be declared as bool grid[10][10];.
The key input is the large number of turns k, stated to be in the range:
1 <= k <= 2^(63) - 1
The problem is that, AFAIK, you're required to perform k turns. This makes the algorithm be in O(k). Thus, no proposed solution can do better than O(k)[1].
To improve the speed in a meaningful way, this upper-bound must be lowered in some way[1], but it looks like this cannot be done without altering the problem constraints.
Thus, no proposed solution can do better than O(k)[1].
The fact that k can be so large is the main issue. The most anyone can do is improve the rest of the implementation, but this will only improve by a constant factor; you'll have to go through k turns regardless of how you look at it.
Therefore, unless some clever fact and/or detail is found that allows this bound to be lowered, there's no other choice.
[1] For example, it's not like trying to determine if some number n is prime, where you can check all numbers in the range(2, n) to see if they divide n, making it a O(n) process, or notice that some improvements include only looking at odd numbers after checking n is not even (constant factor; still O(n)), and then checking odd numbers only up to √n, i.e., in the range(3, √n, 2), which meaningfully lowers the upper-bound down to O(√n).

Calculating the determinant of a matrix

I am trying to calculate the determinant of a square matrix using row operations.
I ran into this code but I do not really understand how it works.
What do subi and subj do? Does it use row operations?
What is the logic behind this code?
int c, subi, i, j, subj;
double submat[10][10],d=0;
if (n == 2) {
return((mat[0][0] * mat[1][1]) - (mat[1][0] * mat[0][1]));
}
else {
for (c = 0; c < n; c++) {
subi = 0;
for (int i = 1; i < n; i++) {
subj = 0;
for (j = 0; j < n; j++) {
if (j == c)
continue;
submat[subi][subj] = mat[i][j];
subj++;
}
subi++;
}
d = d + (pow(-1, c)*mat[0][c] * determinant(n - 1, submat));
}
}
return d;

The function, which looks like:
double determinant(int n, double mat[10][10]);
recursively goes through rows and calls itself on the submatrices for that row and the first column return a value for all by matrices. The recursion ends for 2 by 2 matrices.

This is a recursive function using Laplace expansion to calculate the determinant whose base case is a 2 by 2 matrix.
However, it does not seem to be a good program to me for:
what if the input is a 1 by 1 matrix
submat is limited by size of 10 by 10
submat is a waste of memory
When matrix is large, it is better to use LU decomposition.

code optimization histogram c++ from matlab

LIBIQTOOL_API void Hist(std::vector<double>input, std::vector<double> bins, std::vector<double>& histogram)
{
double minY = *std::min_element(std::begin(input), std::end(input));
double maxY = *std::max_element(std::begin(input), std::end(input));
std::vector<double> edges;
edges.push_back(-1 * std::numeric_limits<double>::infinity());
for (int i = 0; i < bins.size() - 1; i++)
{
edges.push_back(bins[i] + 0.0100 / 2);
}
edges.push_back(std::numeric_limits<double>::infinity());
//histC
histogram.resize(edges.size() - 1);
#pragma omp parallel for
for (int i = 0; i < input.size(); i++)
{
for (int j = 0; j < edges.size() - 1; j++)
{
if ((edges[j] < input[i]) && (input[i] <= edges[j + 1]))
{
histogram[j] = histogram[j] + 1;
break;
}
}
}
histogram[histogram.size() - 1] = histogram[histogram.size() - 1] + histogram[histogram.size() - 2];
histogram.pop_back();
}
the input vector is size 3,000,000++ and the number of bins is ~7000.
I have taken Matlab's Hist() function and created the code I need in c++.
however it take very long to run, can you see more optimizations for runtime which can be done here?
I did:
a. break when you find the bin to place the current number
b. use openMP

Possible optimizations:
do not pass your input data by value, but by const reference
Do not check lower bound, only upper bound for each bin when doing the linear search for the correct bin.
Alternatively: Since your bins are ordered monotonously and there are no gaps, do a binary search for the correct bin, not a linear search.
The last one should give you the greatest gains, the others are more trivial to implement.
Btw the way you fill the edges vector looks strange.

Speed up recursive determinant algorithm

How do I speed up this recursive function? When it reaches a 10x10 matrix, it takes up a minute or so just to solve a problem. I included the event function as well so you can see when the calculation would take place.
void determinantsFrame::OnCalculateClick(wxCommandEvent &event)
{
double elem[MAX][MAX]; double det; string test; bool doIt = true;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
test = (numbers[i][j]->GetValue()).mb_str();
if (test == "")
{
doIt = false;
break;
}
for (int k = 0; k < test.length(); k++)
if (isalpha(test[k]) || test[k] == ' ')
{
doIt = false;
break;
}
else if (ispunct(test[k]))
{
if (test[k] == '.' && test.length() == 1)
doIt = false;
else if (test[k] == '.' && test.length() != 1)
doIt = true;
else if (test[k] != '.')
doIt = false;
}
if (doIt == false)
break;
}
if (doIt == false)
break;
}
if (doIt)
{
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
elem[i][j] = static_cast<double>(wxAtof(numbers[i][j]->GetValue()));
det = determinant(elem, n);
wxMessageBox(wxString::Format(wxT("The determinant is: %.4lf"),det));
}
else
wxMessageBox(wxT("You may have entered an invalid character. Please try again"));
}
double determinantsFrame::determinant(double matrix[MAX][MAX], int order) // Here's the recursive algorithm
{
double det = 0; double temp[MAX][MAX]; int row, col;
if (order == 1)
return matrix[0][0];
else if (order == 2)
return ((matrix[0][0] * matrix[1][1]) - (matrix[0][1] * matrix[1][0]));
else
{
for (int r = 0; r < order; r++)
{
col = 0; row = 0;
for (int i = 1; i < order; i++)
{
for (int j = 0; j < order; j++)
{
if (j == r)
continue;
temp[row][col] = matrix[i][j];
col++;
if (col == order - 1)
col = 0;
}
row++;
}
det = det + (matrix[0][r] * pow(-1, r) * determinant(temp, order - 1));
}
return det;
}
}

You can do a bit better with keeping the same algorithm but it is at least O(n!) (probably worse) so higher order matrices will be slow no matter how much you optimize it. Note I did the benchmark times in MSVC 2010 and are there only for rough comparison purposes. Each change is cumulative as you go down the list and is compared to the original algorithm.
Skip Col Check -- As Surt suggested, removing this gets us a speed increase of 1%.
Add 3x3 Case -- Adding another explicit check for a 3x3 matrix gets us the most, 55%
Change pow() -- Changing the pow() call to (r % 2 ? -1.0 : 1.0) gets us a little bit more, 57%
Change to switch -- Changing the order check to a switch gets us a little bit more, 58%
Add 4x4 Case -- Adding another explicit check for a 4x4 matrix gets more, 85%
Things that don't work include:
memcpy -- As Surt suggested this actually looses a good deal of speed, -100%
Threads -- Creating order threads doesn't work well at all, -160%
I was hoping that using threads could get us a significant performance increase but even with all the optimization it is slower than the original. I think the copying of all the memory is making it not very parallel.
Added the 3x3 and 4x4 cases has the most effect and are the primary reason for the over x6 increase in speed. In theory you could add more explicit cases (probably by creating a program to output the required code) to reduce the speed even further. Of course, at some point this kind of defeats the purpose of using a recursive algorithm to begin with.
To get more performance you would probably have to consider a different algorithm. In theory you can change the recursive function into an iterative one by managing your own stack but it is considerable work and you aren't guaranteed a performance increase anyways.

It could be a branch mispredict problem (see also). The test
if (col == order - 1)
col = 0;
Is not needed as far as I can see.
The test fails 1/order times per loop and dominates for small order, which is why larger N aren't so affected. The timing is still large O(N!^3) (afaik) so don't expect miracles.
col = 0; row = 0;
for (int i = 1; i < order; i++) {
for (int j = 0; j < order; j++) {
if (j == r)
continue;
temp[row][col] = matrix[i][j];
col++;
//if (col == order - 1)
// col = 0;
}
col = 0; // no need to test
row++;
}
The algorithm will get a further slowdown when it hit L2 cache, at latest at N=64.
Also the matrix copy might be ineffective, this could be far more effective for large order at the cost of low effectiveness at low order.
for (int r = 0; r < order; r++) {
row = 0;
for (int i = 1; i < order; i++) {
memcpy(temp[row], matrix[i], r*sizeof(double)); // if r==0 will this work?
memcpy(&temp[row][r], &matrix[i][r+1], (order-r-1)*sizeof(double));
// amount of copied elements r+(order-r-1)=order-1.
row++;
}
Make a test with the original code to get the determinant that I got the indexes right!

Some optimization about the code (computing ranks of a vector)?

The following code is a function (performance-critical) to compute tied ranks of a vector:
//The function here is to compute tied-ranks: answers.com/topic/tied-rank
mergeSort(x,inds,ci);
//mergeSort(): to sort vector x of length ci, also returns keys (inds) of x.
int tj=0;
double xi=x[0];
for (int j = 1; j < ci; ++j)
{
if (x[j] > xi)
{
double rankvalue = 0.5 * (j - 1 + tj);
for (int k = tj; k < j; ++k)
{
ranks[inds[k]] = rankvalue;
};
tj = j;
xi = x[j];
};
};
double rankvalue = 0.5 * (ci - 1 + tj);
for (int k = tj; k < ci; ++k)
{
ranks[inds[k]] = rankvalue;
};
The problem is, the supposed performance bottleneck mergeSort(), which is O(NlogN) is several times faster than the other part of codes (which is O(N)), which suggests there is room for huge improvment with the other part of the codes, any advices?

It seems that the algorithm has quadratic behavior: if x[0] is the largest value in the sequence tj stays 0 and you get up to ci iterations internally. Did you mean to use x[inds[0]] and x[inds[j]]?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js