Caffe GoogleNet classification.cpp gives random outputs - c++

I used Caffe GoogleNet model to train my own data (10k images, 2 classes). I stop it at 400000th iteration with an accuracy of ~80%.
If I run the below command:
./build/examples/cpp_classification/classification.bin
models/bvlc_googlenet/deploy.prototxt
models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
data/ilsvrc12/imagenet_mean.binaryproto
data/ilsvrc12/synset_words.txt
1.png
it gives me a different -- apparently random -- result each time (i.e. if I run it n times, then I get n different results). Why? Does my training fail? Does it still use the old data from the reference model?

I don't think it is a problem with the training. Even if the training data wasn't, it should give the same (possibly wrong) output every time. If you are getting random results, it indicates that the weights are not being loaded properly.
When you load a .caffemodel against a .prototxt, caffe will load the weights of all the layers in the prototxt whose names match with the ones in the caffemodel. For the other layers, it will do a random initialisation (gaussian xavier, etc according to the specification in the prototxt).
So the best thing for you to do now is to check if the model was trained using the same prototxt you are using now.
I see that you are using GoogleNet prototxt and reference_caffenet caffemodel. Is this intentional?

When you want to deploy the fine-tuned model, you should check two main things:
Inputs:
Input image uses a BGR channel instead of RGB (e.g. opencv)
Mean file: Is same as mean file when training?
Prototxt:
When fine-tuning the model, you will change some layers' name in the original prototxt, and you should check whether the same layer name used?
And there are some
Fine-tune tricks and CS231n_transfer_learning which are very useful for fine-tuning.

Related

Approach to get the weight values from the pre-trained weights from Darknet?

I'm currently trying to implement YOLOv3 object detection model in C(only detection, not training).
I have tested my convolution method with arbitrary values and it seems to be working as I expected.
Before stacking up multiple method calls to do forward propagation, I thought it would be safe to test with the actual pretrained weight file data.
When I look up Darknet's pre-trained weight file, it was a huge chunk of binary files. I tried to convert it to hex and decimals, but it still doesn't look simple to pinpoint what part of values to use.
So, my question is, what should I do to extract the decimal numbers of the weights or the filter values so that I can use them in the same order of the forward propagation happening in YOLOv3?
*I'm currently trying to build my c version of YOLOv3 using the structure image shown in https://www.itread01.com/content/1541167345.html
*My c code will be run on an FPGA board called MicroZed, along with other HDL code.
*I tried to plug some printf functions into some places of Darknet code to see what kinds of data are moving around when YOLOv3 runs, however, when I ran it on in Linux terminal, it didn't show anything new and kept outputting the same results.
Any help or advice will be really appreciated. Thank you!
I am not too sure if there is a direct way to read darknet weights, but you can convert it into .h5 format and obtain the weight values from it
You can convert the darknet yolov3 weights into .h5 format (used by keras) by using the appropriate command from this repository.
You can choose the command based on your Yolo version from the list shown in the ReadMe of the linked repo. For the standard yolov3, the command for converting is
python tools/model_converter/convert.py cfg/yolov3.cfg weights/yolov3.weights weights/yolov3.h5
Once you have the .h5weights, you can use the below code snippet for obtaining the
values from the weights. credit/source
import h5py
path = "<path to weights>.h5"
weights = {}
keys = []
with h5py.File(path, 'r') as f: # open file
f.visit(keys.append) # append all keys to list
for key in keys:
if ':' in key: # contains data if ':' in key
param_name = f[key].name
weights[f[key].name] = f[key].value
print(param_name,weights[f[key].name])

Why does dlib's neural net xml export contain different parameters for layers than specified by the trainer?

In DLib it is possible to simply output a neural net type via the dlib::net_to_xml(some_net, some_filename) function. It works fine, but it also displays information like for example the net type, learning_rate_multi. In my case for example it exports the following line for one of the layers (the rest of the exported xml is omitted for clarity):
<fc num_outputs='42' learning_rate_mult='1' weight_decay_mult='1' bias_learning_rate_mult='500' bias_weight_decay_mult='0'>
Those values are correct, except for learning_rate_mult and weight_decay_mult, which always show 1. I tried setting them to different values with the trainer class, like 2 or 0.0001, but they keep showing 1. I verified that the values 2 and 0.0001 indeed were used by the net.
Might this be a bug in dlib's dlib::net_to:xml function?
Those values apply for every layer and are independent from the trainer-values. The layer parameters are relevant for optimzers like the Adam Optimization Algorithm:
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
you can change them by specifyng them in every layer.
So no it is not a bug.

Training and Test Set in Weka InCompatible in Text Classification

I have two datasets regarding whether a sentence contains a mention of a drug adverse event or not, both the training and test set have only two fields the text and the labels{Adverse Event, No Adverse Event} I have used weka with the stringtoWordVector filter to build a model using Random Forest on the training set.
I want to test the model built with removing the class labels from the test data set, applying the StringToWordVector filter on it and testing the model with it. When I try to do that it gives me the error saying training and test set not compatible probably because the filter identifies a different set of attributes for the test dataset. How do I fix this and output the predictions for the test set.
The easiest way to do this for a one off test is not to pre-filter the training set, but to use Weka's FilteredClassifier and configure it with the StringToWordVector filter, and your chosen classifier to do the classification. This is explained well in this video from the More Data Mining with Weka online course.
For a more general solution, if you want to build the model once then evaluate it on different test sets in future, you need to use InputMappedClassifier:
Wrapper classifier that addresses incompatible training and test data
by building a mapping between the training data that a classifier has
been built with and the incoming test instances' structure. Model
attributes that are not found in the incoming instances receive
missing values, so do incoming nominal attribute values that the
classifier has not seen before. A new classifier can be trained or an
existing one loaded from a file.
Weka requires a label even for the test data. It uses the labels or „ground truth“ of the test data to compare the result of the model against it and measure the model performance. How would you tell whether a model is performing well, if you don‘t know whether its predictions are right or wrong. Thus, the test data needs to have the very same structure as the training data in WEKA, including the labels. No worries, the labels are not used to help the model with its predictions.
The best way to go is to select cross validation (e.g. 10 fold cross validation) which automatically will split your data into 10 parts, using 9 for training and the remaining 1 for testing. This procedure is repeated 10 times so that each of the 10 parts has once been used as test data. The final performance verdict will be an average of all 10 rounds. Cross validation gives you a quite realistic estimate of the model performance on new, unseen data.
What you were trying to do, namely using the exact same data for training and testing is a bad idea, because the measured performance you end up with is way too optimistic. This means, you‘ll get very impressive figures like 98% accuracy during testing - but as soon as you use the model against new unseen data your accuracy might drop to a much worse level.

When training a single batch, is iteration of examples necessary (optimal) in python code?

Say I have one batch that I want to train my model on. Do I simply run tf.Session()'s sess.run(batch) once, or do I have to iterate through all of the batch's examples with a loop in the session? I'm looking for the optimal way to iterate/update the training ops, such as loss. I thought tensorflow would handle it itself, especially in the cases where tf.nn.dynamic_rnn() takes in a batch dimension for listing the examples. I thought, perhaps naively, that a for loop in the python code would be the inefficient method of updating the loss. I am using tf.losses.mean_squared_error(batch) for a regression problem.
My regression problem is given two lists of word vectors (300d each), and determines the similarity between the two lists on a continuous scale from [0, 5]. My supervised model is Deepmind's Differential Neural Computer (DNC). The problem is I do not believe it is learning anything. this is due to the fact that the all of the output from the model is centered around 0 and even negative. I do not know how it could possibly be negative given no negative labels provided. I only call sess.run(loss) for the single batch, I do not create a python loop to iterate through it.
So, what is the most efficient way to iterate the training of a model and how do people go about it? Do they really use python loops to do multiple calls to sess.run(loss) (this was done in the training file example for DNC, and I have seen it in other examples as well). I am certain I get the final loss from the below process, but I am uncertain if the model has actually been trained entirely just because the loss was processed in one go. I also do not understand the point of update_ops returned by some functions, and am uncertain if they are necessary to ensure the model has been trained.
Example of what I mean by processing a batch's loss once:
# assume the model has been defined prior through batch_output_logits
train_loss = tf.losses.mean_squared_error(labels=target,
predictions=batch_output_logits)
with tf.Session() as sess:
sess.run(init_op) # pseudo code, unnecessary for question
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# is this the entire batch's loss && model has been trained for that batch?
loss_np = sess.run(train_step, train_loss)
coord.request_stop()
coord.join(threads)
Any input on why I am receiving negative values when the labels are in the range [0, 5] is welcomed as well(general abstract answers for this are fine, because its not the main focus). I am thinking of attempting to create a piece-wise function, if possible, for my loss, so that for any values out of bounds face a rapidly growing exponential loss function. Uncertain how to implement, or if it would even work.
Code is currently private. Once allowed, I will make the repo public.
To run DNC model, go to the project/ directory and run python -m src.main. If there are errors you encounter feel free to let me know.
This model depends upon Tensorflow r1.2, most recent Sonnet, and NLTK's punkt for Tokenizing sentences in sts_handler.py and tests/*.
In a regression model, the network calculates the model output based on the randomly initialized values for your model parameters. That's why you're seeing negative values here; you haven't trained your model enough for it to learn that your values are only between 0 and 5.
Unless I'm missing something, you are only calculating the loss, but you aren't actually training the model. You should probably be calling sess.run(optimizer) on an optimizer, not on your loss function.
You probably need to train your model for multiple epochs (training your model for one epoch = training your model once on the entire dataset).
Batches are used because it is more computationally efficient to train your model on a batch than it is to train it on a single example. However, your data seems to be small enough that you won't have that problem. As such, I would recommend reducing your batch size to as low as possible. As a general rule, you get better training from a smaller batch size, at the cost of added computation.
If you post all of your code, I can take a look.

Building Speech Dataset for LSTM binary classification

I'm trying to do binary LSTM classification using theano.
I have gone through the example code however I want to build my own.
I have a small set of "Hello" & "Goodbye" recordings that I am using. I preprocess these by extracting the MFCC features for them and saving these features in a text file. I have 20 speech files(10 each) and I am generating a text file for each word, so 20 text files that contains the MFCC features. Each file is a 13x56 matrix.
My problem now is: How do I use this text file to train the LSTM?
I am relatively new to this. I have gone through some literature on it as well but not found really good understanding of the concept.
Any simpler way using LSTM's would also be welcome.
There are many existing implementation for example Tensorflow Implementation, Kaldi-focused implementation with all the scripts, it is better to check them first.
Theano is too low-level, you might try with keras instead, as described in tutorial. You can run tutorial "as is" to understand how things goes.
Then, you need to prepare a dataset. You need to turn your data into sequences of data frames and for every data frame in sequence you need to assign an output label.
Keras supports two types of RNNs - layers returning sequences and layers returning simple values. You can experiment with both, in code you just use return_sequences=True or return_sequences=False
To train with sequences you can assign dummy label for all frames except the last one where you can assign the label of the word you want to recognize. You need to place input and output labels to arrays. So it will be:
X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]
Y = [[0,0,...,1], [0,0,....,2]]
In X every element is a vector of 13 floats. In Y every element is just a number - 0 for intermediate frames and word ID for final frame.
To train with just labels you need to place input and output labels to arrays and output array is simpler. So the data will be:
X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]
Y = [[0,0,1], [0,1,0]]
Note that output is vectorized (np_utils.to_categorical) to turn it to vectors instead of just numbers.
Then you create network architecture. You can have 13 floats for input, a vector for output. In the middle you might have one fully connected layer followed by one lstm layer. Do not use too big layers, start with small ones.
Then you feed this dataset into model.fit and it trains you the model. You can estimate model quality on heldout set after training.
You will have a problem with convergence since you have just 20 examples. You need way more examples, preferably thousands to train LSTM, you will only be able to use very small models.