Problem with simple artificial neural network -- adding - c++

I am trying to make a simple artificial neural network work with the backpropagation algorithm. I have created an ANN and I believe I have implemented the BP algorithm correctly, but I may of course be wrong.
Right now, I am trying to train the network by giving it two random numbers (a, b) between 0 and 0.5, and having it add them. Then, of course, each time the output the network gives is compared to the theoretical answer of a + b (which will always be achievable by the sigmoid function).
Strangely, the output always converges to a number between 0 and 1 (as it must, because of the sigmoid function), but the random numbers I'm putting in seem to have no effect on it.
Edit: Sorry, it appears it doesn't converge. Here is an image of the output:
The weights are randomly distributed between -1 and 1, but I have also tried between 0 and 1.
I also tried giving it two constant numbers (0.35,0.9) and trying to train it to spit out 0.5. This works and converges very fast to 0.5. I have also trained it to spit out 0.5 if I give it any two random numbers between 0 and 1, and this also works.
If instead, my target is:
vector<double> target;
target.push_back(.5);
Then it converges very quickly, even with random inputs:
I have tried a couple different networks, since I made it very easy to add layers to my network. The standard one I am using is one with two inputs, one layer of 2 neurons, and a second layer of only one neuron (the output neuron). However, I have also tried adding a few layers, and adding neurons to them. It doesn't seem to change anything. My learning rate is equal to 1.0, though I tried it equal to 0.5 and it wasn't much different.
Does anyone have any idea of anything I could try?
Is this even something an ANN is capable of? I can't imagine it wouldn't be, since they can be trained to do such complicated things.
Any advice? Thanks!
Here is where I train it:
//Initialize it. This will be one with 2 layers, the first having 2 Neurons and the second (output layer) having 1.
vector<int> networkSize;
networkSize.push_back(2);
networkSize.push_back(1);
NeuralNetwork myNet(networkSize,2);
for(int i = 0; i<5000; i++){
double a = randSmallNum();
double b = randSmallNum();
cout << "\n\n\nInputs: " << a << ", " << b << " with expected target: " << a + b;
vector<double> myInput;
myInput.push_back(a);
myInput.push_back(b);
vector<double> target;
target.push_back(a + b);
cout << endl << "Iteration " << i;
vector<double> output = myNet.backPropagate(myInput,target);
cout << "Output gotten: " << output[0];
resultPlot << i << "\t" << abs(output[0] - target[0]) << endl;
}
Edit: I set up my network and have been following from this guide: A pdf. I implemented "Worked example 3.1" and got the same exact results they did, so I think my implementation is correct, at least as far as theirs is.

As #macs states, the maximum output of standard sigmoid is 1, so, if you try to add n numbers from [0, 1], then your target should be normalized, i.e. sum(A1, A2, ..., An) / n.

In a model such as this, the sigmoid function (both in the output and in the intermediate layers) is used mainly for producing something that resembles a 0/1 toggle while still being a continuous function, so using it to represent a range of numbers is not what this kind of network is designed to do. This is because it is designed mostly with classification problems in mind.
There are, of course, other NN models that can do that sort of thing (for example, dropping the sigmoid on the output and just keeping it as a sum of its children).
If you can redefine your model in terms of classifying the input, you'll probably get better results.
Some examples of similar tasks for which the network will be more suitable:
Test whether the output is bigger or smaller than a certain constant - this should be very easy.
Output: A series of outputs, each representing a different potential value (for example, one output each for the the values between 0 and 10, one for 'more than 10', and one for 'less than 0'). You will want your network to round the result to the nearest integer
A tricky one will be to try and create a boolean representation of the output by having multiple output nodes.
None of these will give you the precision that you may want, though, since by nature NNs are more 'fuzzy'

Related

How to create and calculate a formula using an unknown number of variables in C++

Okay, so this is going to be very complicated to explain through text but I will do my best to try.
I am making a universal calculator where one of the function of the calculator is to process a formula when given an unknown number of variables. I have seen some ways to do this but for how i'm trying to use this calculator, it wont work.
Example for sum of function:
while (cin >> input)
count++;
Normally this would work but the problem is that I can't have the user input the values over and over again for one formula like for this formula: Formula Example
(Sorry its easier for me to explain through a picture) In it there are multiple times where I have to use the same numbers over and over again. Here is the entire process if you need it to understand what I'm saying:
Entire problem
The problem is that normally I would add another float for every point graph but I don't know ahead of time number of floats the user is going to enter in. The ideal way to do this is for the program to ask the user for all the points on the table and for the user to input those points in a format like: "(1,2) (2,4) (3,6)..."
Thinking ahead, would I make a function where the program creates an integer and assigns the integer to a value on the fly? But then how would the actual math formula interact with the new integers if they haven't been created yet?
Talking about this makes my head hurt....
I actually want to say more:
One idea that I tried to make in my head was something like
string VariableName = A [or something]
Then you would reassign VariableName = "A" to VariableName = "B" by something like VariableName = "A"+ 1 (which would equal B).
Then you would repeat that step until the user inputs a invalid input. But obviously you can't do math with letters so I wouldn't know how to do it.
I think that you are overthinking this. It's pretty simple and it doesn't need to store the input values.
The main thing to note is that you need to compute (step 2) the sum of the values of X and Y, the sum of their product and the sum of X squared. To compute the sum of a lot of values you don't need all the values together, but just one at the time. Exactly as when a user provides them. So declare four variables: sx, sy, sxy, sxx. Initialize them to 0. At every couple of X and Y you get, add it to sx and sy, add their product to sxy and the product of X with itself to sxx.
Now you've got all you need to compute the final result for a and b.
Anyway a good C++ book would be useful.

Own simple load balancer for dynamic chances / probabilities (in C++ but language undependent)

H_ello lovely people,
my program is written as a scalable network framework. It consists of several components that run as individual programs. Additional instances of the individual components can be added dynamically.
The components initially register with IP and Port at a central unit. This manager periodically sends to the components where other components can be found. But not only that, each component is assigned a weight / probability / chance of how often it should be addressed compared to the others.
As an example: 1Master, Component A, B, C
All Components registered at Master, Master sends to A: [B(127.0.0.1:8080, 3); C(127.0.0.1:8081. 5)]
A runs in a loop and calculates the communication partner over and over again from this data.
So, A should request B and C in a 3 to 5 ratio. How many requests each one ultimately gets depends on the running performance. This is about the ratio.
Of course, the numbers 3 and 5 come periodically and change dynamically. And it's not about 3 components but potentially hundreds.
My idea was:
Add 3 and 5. Calculate a random number between 1 and 8. If it is greater than 3, take C else B ....
But I think that's not a clean solution. Probably computationally intensive in every loop. In addition, management and data structures are expensive. In addition, I think that a random number from the STL is not balanced enough.
Can someone perhaps give me a hint, how I implemented this cleanly or does someone have experiences with it or an idea?
Thank you in every case;)
I have an idea for you:
Why not try it with cummulative probabilities?
1.) Generate a uniformly distributed random number.
2.) Iterate through your list until the cumulative probability of the visited element is greater than the random number.
Look at this (Java code but will also work in C++), (your hint that you use C++ was very good!!!)
double p = Math.random();
double cumulativeProbability = 0.0;
for (Item item : items) {
cumulativeProbability += item.probability();
if (p <= cumulativeProbability) {
return item;
}
}

Neural Network gives same output for different inputs, doesn't learn

I have a neural network written in standard C++11 which I believe follows the back-propagation algorithm correctly (based on this). If I output the error in each step of the algorithm, however, it seems to oscillate without dampening over time. I've tried removing momentum entirely and choosing a very small learning rate (0.02), but it still oscillates at roughly the same amplitude per network (with each network having a different amplitude within a certain range).
Further, all inputs result in the same output (a problem I found posted here before, although for a different language. The author also mentions that he never got it working.)
The code can be found here.
To summarize how I have implemented the network:
Neurons hold the current weights to the neurons ahead of them, previous changes to those weights, and the sum of all inputs.
Neurons can have their value (sum of all inputs) accessed, or can output the result of passing said value through a given activation function.
NeuronLayers act as Neuron containers and set up the actual connections to the next layer.
NeuronLayers can send the actual outputs to the next layer (instead of pulling from the previous).
FFNeuralNetworks act as containers for NeuronLayers and manage forward-propagation, error calculation, and back-propagation. They can also simply process inputs.
The input layer of an FFNeuralNetwork sends its weighted values (value * weight) to the next layer. Each neuron in each layer afterwards outputs the weighted result of the activation function unless it is a bias, or the layer is the output layer (biases output the weighted value, the output layer simply passes the sum through the activation function).
Have I made a fundamental mistake in the implementation (a misunderstanding of the theory), or is there some simple bug I haven't found yet? If it would be a bug, where might it be?
Why might the error oscillate by the amount it does (around +-(0.2 +- learning rate)) even with a very low learning rate? Why might all the outputs be the same, no matter the input?
I've gone over most of it so much that I might be skipping over something, but I think I may have a plain misunderstanding of the theory.
It turns out I was just staring at the FFNeuralNetwork parts too much and accidentally used the wrong input set to confirm the correctness of the network. It actually does work correctly with the right learning rate, momentum, and number of iterations.
Specifically, in main, I was using inputs instead of a smaller array in to test the outputs of the network.

Log likelihood output of EM in openCV

I am using the EM model in openCV and have run in to difficulty with the log likelihood value output from EM::train. I am training the EM models on separate (but correlated) data and the log likelihood values are linked to the absolute values of the data, i.e. when the training values are greater than 100, EM::train returns strongly positive log likelihood values, when the training values are less than one large negative values are returned, my understanding is this doesn't make any sense given the way EM works.
The section of code that trains the models is below, in case I am making any silly mistakes (I am quite new to opencv) std::cout confirms that the values are larger than one when the training values are larger than 100 and smaller than -1 when the training values are less than 1.
My problem is similar to this post:
OpenCV: Output of the predict function of Expectation Maximization
except there is no way around using the log-likelihoods without implementing something like a a DPGMM.
Many Thanks,
Sam
{
double logs =0;
cv::EM model (g,1,parameters);
model.train(sample_input.col(i),log_likelihoods);
for(int j = 0; j < number_selected; j++ )
{
logs += log_likelihoods.at<double>(j);
std::cout << log_likelihoods.at<double>(j) <<' ';
}
mean_of_logs[i] += 0.1*logs;
reduced_models.push_back(model);
}

Better alternative to divide and conquer algorithm

First let me explain the problem I'm trying to solve. I'm integrating my code with 3rd party library which does quite complicated financial predictions. For the purposes of this question let's just say I have a blackbox which returns y when I pass in x.
Now, what I need to do is find input (x) for a given output (y). Since I know lowest and highest possible input values I wrote the following algorithm:
define starting input range (minimum input value to maximum input value)
divide the range into two equal parts and find output for a middle value
find which half output falls into
repeat steps 2 and 3 until range is too small to divide any further
This algorithm does the job nicely, I don't see any problems with it. However, is there a faster way to solve this problem?
It sounds like x and y are strongly correlated (i.e. as x increases, so does y), as otherwise your divide and conquer algorithm wouldn't work.
Assumuing this is the case, and you could work out a correlation factor, then you might be able to multiply the midpoint by the correlation factor to potentially hone in the expected value quicker.
Please note that I've not tested this idea at all, but it's something to think about. Possible improvements would be to make the correlationFactor a moving average, or precompute it based on, say, the deciles between xLow and xHigh.
Also, this assumes that calling f(x) is relatively inexpensive. If it is expensive, then the increased number of calls to f(x) would dwarf any savings. In fact - I'm starting to think this is a stupid idea...
Hopefully the following pseudo-code illustrates what I mean:
DivideAndConquer(xLow, xHigh, correlationFactor, expectedValue)
xMid = (xHigh - xLow) * correlationFactor
// Add some range checks to make sure that xMid is within xLow and xHigh!!
y = f(xMid)
if (y == expectedValue)
return expectedValue
elseif (y < expectedValue)
correlationFactor = (xMid - xLow) / (f(xMid) - f(xLow))
return DivideAndConquer(xLow, xMid, correlationFactor, expectedValue)
else
correlationFactor = (xHigh - xMid) / (f(xHigh) - f(xMid))
return DivideAndConquer(xMid, xHigh, correlationFactor, expectedValue)