Neural Network seems to work fine until used for processing data (all of the results are practically the same) - c++

I have recently implemented a typical 3 layer neural network (input -> hidden -> output) and I'm using the sigmoid function for activation. So far, the host program has 3 modes:
Creation, which seems to work fine. It creates a network with a specified number of input, hidden and output neurons, initializes the weights to either random values or zero.
Training, which loads a dataset, computes the output of the network then backpropagates the error and updates the weights. As far as I can tell, this works ok. The weights change, but not extremely, after training on the dataset.
Processing, which seems to work ok. However, the data output for the dataset which was used for training, or any other dataset for that matter is very bad. It's usually either just a continuuous stream of 1's, with an occasional 0.999999 or every output value for every input is 0.9999 with the last digits being different between inputs. As far as I could tell there was no correlation between those last 2 digits and what was supposed to be outputed.
How should I go about figuring out what's not working right?

You need to find a set of parameters (number of neurons, learning rate, number of iterations for training) that works well for classifying previously unseen data. People often achieve this by separating their data into three groups: training, validation and testing.
Whatever you decide to do, just remember that it really doesn't make sense to be testing on the same data with which you trained, because any classifcation method close to reasonable should be getting everything 100% right under such a setup.

Related

Artificial Neural Network with large inputs & outputs

I've been following Dave Miller's ANN C++ Tutorial, and I've been having some problems getting it to function as expected.
You can view the code I'm working with here. It's an XCode project, but includes the main.cpp and data set file.
Previously, this program would only gives outputs between -1 and 1, I'm presuming due to the use of the tanh function. I've manipulated the data inputs so I can input my data that is much larger and have valid outputs. I've simply done this by multiplying the input values by 0.0001, and multiplying the output values by 10000.
The training data I'm using is the included CSV file. The last column is the expected output, the rest are inputs. Am I using the wrong mathematical function for these data?
Would you say that this is actually learning? This whole thing has stressed me out so much, I understand the theory behind ANN's but just can't implement from scratch for myself.
The net recent average error definitely gets smaller and smaller, which to me would say it is learning.
I'm sorry if I haven't explained myself very well, I'm very new to ANN's and this whole thing is very confusing to me. My university lecturers are useless when it comes to the practical side, they only teach us the theory of it.
I've been playing around with the eta and alpha values, along with the number of hidden layers.
You explained yourself quite well, if the net recent average is getting lower and lower it probably means that the network is actually learning, but here is my suggestion about how to be completely sure.
Take you CSV file and split it into 2 files one should be about 10% of the all data and the other all the remaining.
You start with an untrained network and you run your 10% file trough the net and for each line you save the difference between actual output and expected result.
Then you train the network only with the 90% of the CSV file you have and finally you re run trough the NET the first 10% file again and you compare the differences you had on the first run with the the latest ones.
You should find out that the new results are much closer to the expected values than the first time, and this would be the final proof that your network is learning.
Does this make any sense ? if not please send share some code or send me a link to the exercise you are running and I will try to explain it in code.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Neural Network gives same output for different inputs, doesn't learn

I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neurons hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.
Neurons can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.
NeuronLayers act as Neuron containers and set up the actual connections to the next layer.
NeuronLayers can send the actual outputs to the next layer (instead of pulling from the previous).
FFNeuralNetworks act as containers for NeuronLayers and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.
The input layer of an FFNeuralNetwork sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).
Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.
It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.
Specifically, in main, I was using inputs instead of a smaller array in to test the outputs of the network.

How to process multiple inputs in c++ for artificial neural network

How does C++ process multiple inputs for an artificial neural network in real-time?
I'm assuming this is without using a spiking neural network, but a more traditional one (i.e. just a basic neural network as described here)
http://www.ai-junkie.com/ann/evolved/nnt1.html
Is this possible in a real-time world? I was thinking one would have to process either each input individually (which will always result in the same output, hence the dilemna), or accrue a certain # of inputs per time threshold and then process them at once...
then again, what does someone do with multiple instances of the same input? Process it twice?
I ask this because I'm looking at neuralbot, which I believe uses a normal neural network, but I'm trying to understand ANN's first before I delve into it, and am not sure how an ANN processes multiple inputs before processing target output(s).
Your question is not really clear but I'll try to answer. :)
You can see an ANN (Artificial Neural Network) like a particular case of Adaptive Filters.
There are three main elements:
A sequence of inputs x(n).
A Parametric Variable Filter. In this case the filter is the ANN and the parameters are the neurons weights.
An Update Algorithm that updates the filter parameters according to the error between desired an actual output. In ANN the most used update algorithm is the Backpropagation Algorithm.
In ANN there are two steps:
The Training Step. This is the hard part. You start with random neurons weights. You have a sequence of inputs and their desired outputs and you run the ANN with the update algorithm on. When the error is under a certain threshold you can say that your ANN is trained. This step is usually done off-line (not in real time).
The Execution Step. You have the trained ANN. Now just put in sequence the input in it and use the output. This is usually a fast operation and can be done in real-time (if it is what you mean).
Now.. what do you mean for "multiple inputs at once"? First of all standard computers can do only a very very small number of operations at once, standard PC has 4/8 cores so can do almost 4-8 operations at once. This number is too low for every real world ANN applications.
You said:
what does someone do with multiple instances of the same input? Process it twice?
The answer is Yes. The "Execution Step" is so fast that there are no reason to don't do it. In the "Training Step" duplicated inputs can be removed before training starts (cause the training inputs are known a priori). So there are no problem in this. :)

Face Recognition Using Backpropagation Neural Network?

I'm very new in image processing and my first assignment is to make a working program which can recognize faces and their names.
Until now, I successfully make a project to detect, crop the detected image, make it to sobel and translate it to array of float.
But, I'm very confused how to implement the Backpropagation MLP to learn the image so it can recognize the correct name for the detected face.
It's a great honor for all experts in stackoverflow to give me some examples how to implement the Image array to be learned with the backpropagation.
It is standard machine learning algorithm. You have a number of arrays of floats (instances in ML or observations in statistics terms) and corresponding names (labels, class tags), one per array. This is enough for use in most ML algorithms. Specifically in ANN, elements of your array (i.e. features) are inputs of the network and labels (names) are its outputs.
If you are looking for theoretical description of backpropagation, take a look at Stanford's ml-class lectures (ANN section). If you need ready implementation, read this question.
You haven't specified what are elements of your arrays. If you use just pixels of original image, this should work, but not very well. If you need production level system (though still with the use of ANN), try to extract more high level features (e.g. Haar-like features, that OpenCV uses itself).
Have you tried writing your feature vectors to an arff file and to feed them to weka, just to see if your approach might work at all?
Weka has a lot of classifiers integrated, including MLP.
As I understood so far, I suspect the features and the classifier you have chosen not to work.
To your original question: Have you made any attempts to implement a neural network on your own? If so, where you got stuck? Note, that this is not the place to request a complete working implementation from the audience.
To provide a general answer on a general question:
Usually you have nodes in an MLP. Specifically input nodes, output nodes, and hidden nodes. These nodes are strictly organized in layers. The input layer at the bottom, the output layer on the top, hidden layers in between. The nodes are connected in a simple feed-forward fashion (output connections are allowed to the next higher layer only).
Then you go and connect each of your float to a single input node and feed the feature vectors to your network. For your backpropagation you need to supply an error signal that you specify for the output nodes. So if you have n names to distinguish, you may use n output nodes (i.e. one for each name). Make them for example return 1 in case of a match and 0 else. You could very well use one output node and let it return n different values for the names. Probably it would even be best to use n completely different perceptrons, i.e. one for each name, to avoid some side-effects (catastrophic interference).
Note, that the output of each node is a number, not a name. Therefore you need to use some sort of thresholds, to get a number-name relation.
Also note, that you need a lot of training data to train a large network (i.e. to obey the curse of dimensionality). It would be interesting to know the size of your float array.
Indeed, for a complex decision you may need a larger number of hidden nodes or even hidden layers.
Further note, that you may need to do a lot of evaluation (i.e. cross validation) to find the optimal configuration (number of layers, number of nodes per layer), or to find even any working configuration.
Good luck, any way!