Interpretation of a Multilayer Perceptron Modeling Result on Weka - weka

I'm generating a power model using Multilayer Perceptron on Weka which is a statistic toolbox.
Weka shows following generated power model, however, I don't know how to interpret it.
How can I calculate the predicted value using this model generated by Weka? I want to know how to calculate it by hand with the model.
Thanks.
=== Classifier model (full training set) ===
Linear Node 0
Inputs Weights
Threshold -0.040111709313733535
Node 1 -1.8468414006209548
Node 2 0.8245441127585728
Node 3 -0.6384807874184006
Node 4 -0.7484784535220612
Sigmoid Node 1
Inputs Weights
Threshold -0.24446294747264816
Attrib CPU-User -0.608249350584644
Attrib CPU-System 0.13288901868419942
Attrib CPU-Idle 1.0072001456456134
Attrib GPS 0.39886318520181463
Attrib WIFI 2.661390547312707
Attrib Disk-Write 3.3144190265114104
Attrib Screen -0.18379082022126372
Sigmoid Node 2
Inputs Weights
Threshold -0.04552879905091134
Attrib CPU-User 1.2010400180021503
Attrib CPU-System -0.415901207849663
Attrib CPU-Idle -1.8201808907618635
Attrib GPS 0.3297713837591742
Attrib WIFI 2.670046643619425
Attrib Disk-Write 1.0132120671943607
Attrib Screen 1.5785512067159402
Sigmoid Node 3
Inputs Weights
Threshold -7.438472914350278
Attrib CPU-User -6.382669043988483
Attrib CPU-System -1.6622872921207548
Attrib CPU-Idle -0.12729502604878612
Attrib GPS -0.9716992577028621
Attrib WIFI 0.6911695390337304
Attrib Disk-Write -1.1769266028873722
Attrib Screen 0.5101113538728531
Sigmoid Node 4
Inputs Weights
Threshold -5.509838959208244
Attrib CPU-User -0.3709271557180943
Attrib CPU-System -1.7448007514288941
Attrib CPU-Idle -0.08176108597065958
Attrib GPS -1.0234447340811823
Attrib WIFI -1.5759133030274077
Attrib Disk-Write 0.2376861365371351
Attrib Screen -1.5654514081278506
Class
Input
Node 0
Time taken to build model: 0.81 seconds
=== Predictions ontest split===
inst#, actual, predicted, error
1 153727.273 169587.843 15860.57
2 159036.364 168657.043 9620.68
....

This presentation gives some detail on the background and equations used for neural networks. Weka's output gives you the type of each node and the inputs and weights. You should be able to calculate the numbers yourself using that information.

In the case of Multilayer Perceptron in WEKA. In case you have not altered the network topology the nodes in the hidden layer of this network are all sigmoid but the output nodes are linear units.
eg.
'Linear Node 0' is your OUTPUT unit and Sigmoid Node 1 to 4 are your 4 hidden units. All the values given are your interconnection weights. you can use them to manually calculate your results which are being obtained.

The question is old now, but others might be interested in the answer. Note that by default, Weka normalises the inputs and output to the -1 to 1 range, which often improves model accuracy. You will need to scale your inputs to the same range, and scale your output from that range back to the original range.
The activation function is 1/(1+exp(-activation)) for the non-linear units. With more than a few units, this quickly gets messy, but you can get the right answer. Note also that what Weka calls Threshold, is also called Bias in the literature, and is simply added to the activation of the unit it is attached to.

Related

How to visualize surface normal of kinect data using PCL in ROS?

I am trying to get surface normal from my kinect2 data using PCL in ROS. I am having trouble in visualizing normal data.
I am using Following Viewer to view real time point cloud.
I have added point normal code of PCL to this code to calculate and visualize normal.
I am getting following runtime error:
ERROR: In /home/chandan_main/Downloads/VTK-7.1.0/Rendering/OpenGL2/vtkOpenGLPolyDataMapper.cxx, line 1794
vtkOpenGLPolyDataMapper (0xa1ea5e0): failed after UpdateShader 1 OpenGL errors detected
0 : (1281) Invalid value
[pcl::IntegralImageNormalEstimation::setInputCloud] Input dataset is not organized (height = 1).
[pcl::IntegralImageNormalEstimation::initCompute] Input dataset is not organized (height = 1).
[addPointCloudNormals] The number of points differs from the number of normals!
[pcl::IntegralImageNormalEstimation::setInputCloud] Input dataset is not organized (height = 1).
[pcl::IntegralImageNormalEstimation::initCompute] Input dataset is not organized (height = 1).
[addPointCloudNormals] The number of points differs from the number of normals!
[pcl::IntegralImageNormalEstimation::setInputCloud] Input dataset is not organized (height = 1).
[pcl::IntegralImageNormalEstimation::initCompute] Input dataset is not organized (height = 1).
[addPointCloudNormals] The number of points differs from the number of normals!
I am able to get the normal now.I have just used
while(!viewer->wasStopped())
{
viewer->spinOnce (100); boost::this_thread::sleep(boost::posix_time::microseconds (100000));
}
becuse I was trying to get normal in real time.It was showing errors.I also rebuilt VTK library which had issues.

Why scale pixels between -1 and 1 sample-wise in the preprocess step for image classification

In the preprocess_input() function found at the link below, the pixels are scaled between -1 and 1. I have seen this used elsewhere as well. What is the reason for scaling between -1 and 1 as opposed to 0 and 1. I was under impression that common ranges for pixels where between 0-255 or if normalized 0-1.
https://github.com/keras-team/keras/blob/master/keras/applications/imagenet_utils.py
The normalization between -1 and 1 aims to making the data has a mean of 0 and std_dev of 1. (i.e. Normal distribution). Also the choice of the activation function used in the network determines the kind of normalization especially when using batch normalization.
For example if sigmoid was used and the normalization is done between 0 and 1, then all the negative values obtained from the network (multiplying weights with inputs and then adding biases) will be mapped to zero. (that leads to more vanishing gradients during backpropagation)
Whereas with tanh and normalization between -1 and 1, those negative values will be mapped to corresponding negative values between 0 and -1.
tanh usually is the activation function used in Convolutional Networks and GANs and is preferred to Sigmoid.
"Tanh. The tanh non-linearity is shown on the image above on the right. It squashes a real-valued number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity." from the infamous Andrej Karpathy Course cs231n.github.io/neural-networks-1

Caffe: Multi-Label Images with Varying Number of Labels

I have a dataset where the images have VARYING number of labels. The number of labels is between 1 and 5. There are 100 classes.
After googling, it seems like HDF5 db with slice layer can deal with multiple labels, as in the following URL.
The only problem is that it supposes a fixed number of labels. Following this, I would have to create a 1x100 matrix, where entry value is 1 for the labeled classes, and 0 for non-label classes, as in the following definition:
layers {
name: "slice0"
type: SLICE
bottom: "label"
top: "label_matrix"
slice_param {
slice_dim: 1
slice_point: 100
}
}
where each image contains a a label looking like (1,0,0,...1,...0,....,0,1) where the vector size is 100 dimension.
Now, I apologize that my question becomes somehow vague, but is this a feasible idea? I.e., is there a better approach to this problem?
I get that you have 5 types of labels that are not always present for each data point. 1 of the 5 labels is for 100-way classification. Correct so far?
I would suggest always writing all 5 labels into your HDF5 and use a special value for when the label is missing. You can then use the missing_value option to skip computing the loss for that layer for that iteration. Using it requires add loss_param{ ignore_label = Y } to the loss layer in your network prototxt definition where Y is a scalar.
The backpropagated error will only be a function of labels that are present. If input X does not have a valid value for a label, the network will still produce an estimate for that label. But it will not be penalized for it. The output is produced without any effect on how the weights are updated in that iteration. Only outputs for non-missing labels contribute to the error signal and the weight gradients.
It seems that only the Accuracy and SoftmaxWithLossLayer layers support missing_values.
Each label is a 1x5 matrix. The first entry can be for the 100-way classification (e.g. [0-99]) and entries 2:5 have scalars that reflect the values that the other labels can take. The order of the columns is the same for all entries in your dataset. A missing label is marked by a special value of your choosing. This special value has to lie outside the set of valid label values. This will depend on what those labels represent. If a label value of -1 never occurs you can use this to flag a missing label.

How to find why a RBM does not work correctly?

I'm trying to implement a RBM and I'm testing it on MNIST dataset. However, it does not seems to converge.
I've 28x28 visible units and 100 hidden units. I'm using mini-batches of size 50. For each epoch, I traverse the whole dataset. I've a learning rate of 0.01 and a momentum of 0.5. The weights are randomly generated based on a Gaussian distribution of mean 0.0 and stdev of 0.01. The visible and hidden biases are initialized to 0. I'm using a logistic sigmoid function as activation.
After each epoch, I compute the average reconstruction error of all mini-batches, here are the errors I get:
epoch 0: Reconstruction error average: 0.0481795
epoch 1: Reconstruction error average: 0.0350295
epoch 2: Reconstruction error average: 0.0324191
epoch 3: Reconstruction error average: 0.0309714
epoch 4: Reconstruction error average: 0.0300068
I plotted the histograms of the weights to check (left to right: hiddens, weights, visibles. top: weights, bottom: updates):
Histogram of the weights after epoch 3
Histogram of the weights after epoch 3 http://baptiste-wicht.com/static/finals/histogram_epoch_3.png
Histogram of the weights after epoch 4
Histogram of the weights after epoch 4 http://baptiste-wicht.com/static/finals/histogram_epoch_4.png
but, except for the hidden biases that seem a bit weird, the remaining seems OK.
I also tried to plot the hidden weights:
Weights after epoch 3
Weights after epoch 3 http://baptiste-wicht.com/static/finals/hiddens_weights_epoch_3.png
Weights after epoch 4
Weights after epoch 4 http://baptiste-wicht.com/static/finals/hiddens_weights_epoch_4.png
(they are plotted in two colors using that function:
static_cast<size_t>(value > 0 ? (static_cast<size_t>(value * 255.0) << 8) : (static_cast<size_t>(-value * 255.)0) << 16) << " ";
)
And here, they do not make sense at all...
If I go further, the reconstruction error falls a bit more, but do no go further than 0.025. Even if I change the momentum after sometime, it goes higher and then goes down a bit but not interestingly. Moreover, the weights do no make more sense after more epochs. In most example implementations I've seen, the weights were making some sense after iterating through the complete data set two or three times.
I've also tried to reconstruct an image from the visible units, but the results seems almost random.
What could I do to check what goes wrong in my implementation ? Should the weights be within some range ? Does something seems really strange in the data ?
Complete code: https://github.com/wichtounet/dbn/blob/master/include/rbm.hpp
You are using a very small learning rate. In most NNs trained by SGD you start out with a higher learning rate and decay it over time. Search for learning rate or adaptive learning rate to find more information on that.
Second, when implementing a new algorithm I would recommend finding the paper that introduced it and reproducing their results. A good paper should include most of the settings used - or the method used to determine the settings.
If a paper is unavailable, or it was tested on a data set you don't have access to - go find a working implementation and compare the outputs when using the same settings. If the implementations are not feature compatible, turn off as many features as you can that are not shared.

How to find/detect optimal parameters of a Grid Search in Libsvm+Weka?

I'm trying to use SVM with Weka framework. So i'm using Libsvm. I'm new to SVM and reading the guide on the site of Libsvm I read that is possible to discover optimal parameters for SVM (cost and gamma) using GridSearch. So i choose Grid Search on Weka and I obtained a bad classification results (TN rate around 1%). So how do I have to interpret these results? If using optimal parameter I got bad results is there no chance for me to get better classification?What I mean is: Grid Search give me the Best results that i can obtain using SVM?
My dataset is formed by 1124 instances (89% negative class, 11% positive class) and there are 31 attributes (2 of them are nominal others are numeric). I'm using a cross validation (10-fold) on the whole dataset to test the model.
I tried to use GridSearch (I normalized each attribute values between 0 and 1, no features selection but I change class value from 0 and 1 to 1 and -1 accroding to SVM theory but T don't know if it useful) with these parameters: cost from 1 to 18 with 1.0 step and gamma from -5 to 10 with 1.0 step. Results are sensitivity 93,6% and specificity 64.8% but these takes around 1 hour to complete computation!!
I'd like to get better results compared with decision tree. Using Features Selection (Info Gain ranking) + SMOTE oversampling + Cost Sensitive Learning I obtained sensitivity 91% and specificity 80%. Is there a way to tune SVM without trying every possible range of values for cost and gamma?