Do OpenCV's machine learning algorithms continuously update a model? - c++

Most machine learning algorithms implemented in OpenCV 2.4 built upon a CvStatModel which comes with a CvStatModel::train method.
There it says:
By default, the input feature vectors are stored as train_data rows, that is, all the components (features) of a training vector are stored continuously.
and
Usually, the previous model state is cleared by CvStatModel::clear() before running the training procedure. However, some algorithms may optionally update the model state with the new training data, instead of resetting it.
How do I know which ml algorithm isn't resetting the current model state. Since I wanted to use CvGBTrees::train which has a update parameter declared as being only a dummy parameter, I guess the model is discarded after every training call. Can I take it that if there is no such update parameter the current model state will always be discarded?
I need a machine learning algorithm which continuously trains one model and doesn't start with an initial model every training call.
Is this doable with the current ml implementations in OpenCV? and if so with which ones? Furthermore, if not are there other c++ libraries that would do so?

Related

Correct approach to improve/retrain an offiline model

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an e-greedy strategy. I want to migrate from BC to MARWIL changing the beta.
There is a couple of ways to do that:
Convert the data employed to train the BC algorithm plus the agent’s new data and retrain from scratch using MARWIL.
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain from scratch using MARWIL .
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain using the restored BC agent using MARWIL .
Questions:
Following option 1.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
Following option 2.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
This approach works for trajectories associated with new episodes ids, but it will extend the trajectories of episodes already present in the original batch?
Following option 3.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
This approach works for trajectories associated with new episodes ids, but it will extend the trajectories of episodes already present in the original batch?
The retrain would update the networks’ weights using the new data points, but to do that how many iterations should we use?
How to prevent catastrophic forgetting?

Mxnet with symbol API: batch normalization update

I am currently training a convolutional neural network using Mxnet, with the C++ Symbol API. This network contains some Batchnormalization layers, which contains the four parameter NDArray. Two of them, the moving_mean and moving_variance parameter are supposed to be updated at every batch during the training.
I was guessing that, since the boolean for the forward pass of the executor is set to true, it would update automatically the new parameters. However, for some reasons, these two NDArray remains still, without any update of the parameter. How so? Besides, since there are no gradients computed for these two NDArray, because it is not "learnable" parameters, I have no way to update the values through the regular optimizer update function. How to tell Mxnet, using the symbol API, to update the moving_mean and moving_variance NDArrays?
moving_mean and moving_variance are updated during the backward pass of training, rather than during the optimization step like other parameters. One other reason these parameters could remain fixed during training is if you've set use_global_stats=True on the BatchNorm layer.

When training a single batch, is iteration of examples necessary (optimal) in python code?

Say I have one batch that I want to train my model on. Do I simply run tf.Session()'s sess.run(batch) once, or do I have to iterate through all of the batch's examples with a loop in the session? I'm looking for the optimal way to iterate/update the training ops, such as loss. I thought tensorflow would handle it itself, especially in the cases where tf.nn.dynamic_rnn() takes in a batch dimension for listing the examples. I thought, perhaps naively, that a for loop in the python code would be the inefficient method of updating the loss. I am using tf.losses.mean_squared_error(batch) for a regression problem.
My regression problem is given two lists of word vectors (300d each), and determines the similarity between the two lists on a continuous scale from [0, 5]. My supervised model is Deepmind's Differential Neural Computer (DNC). The problem is I do not believe it is learning anything. this is due to the fact that the all of the output from the model is centered around 0 and even negative. I do not know how it could possibly be negative given no negative labels provided. I only call sess.run(loss) for the single batch, I do not create a python loop to iterate through it.
So, what is the most efficient way to iterate the training of a model and how do people go about it? Do they really use python loops to do multiple calls to sess.run(loss) (this was done in the training file example for DNC, and I have seen it in other examples as well). I am certain I get the final loss from the below process, but I am uncertain if the model has actually been trained entirely just because the loss was processed in one go. I also do not understand the point of update_ops returned by some functions, and am uncertain if they are necessary to ensure the model has been trained.
Example of what I mean by processing a batch's loss once:
# assume the model has been defined prior through batch_output_logits
train_loss = tf.losses.mean_squared_error(labels=target,
predictions=batch_output_logits)
with tf.Session() as sess:
sess.run(init_op) # pseudo code, unnecessary for question
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# is this the entire batch's loss && model has been trained for that batch?
loss_np = sess.run(train_step, train_loss)
coord.request_stop()
coord.join(threads)
Any input on why I am receiving negative values when the labels are in the range [0, 5] is welcomed as well(general abstract answers for this are fine, because its not the main focus). I am thinking of attempting to create a piece-wise function, if possible, for my loss, so that for any values out of bounds face a rapidly growing exponential loss function. Uncertain how to implement, or if it would even work.
Code is currently private. Once allowed, I will make the repo public.
To run DNC model, go to the project/ directory and run python -m src.main. If there are errors you encounter feel free to let me know.
This model depends upon Tensorflow r1.2, most recent Sonnet, and NLTK's punkt for Tokenizing sentences in sts_handler.py and tests/*.
In a regression model, the network calculates the model output based on the randomly initialized values for your model parameters. That's why you're seeing negative values here; you haven't trained your model enough for it to learn that your values are only between 0 and 5.
Unless I'm missing something, you are only calculating the loss, but you aren't actually training the model. You should probably be calling sess.run(optimizer) on an optimizer, not on your loss function.
You probably need to train your model for multiple epochs (training your model for one epoch = training your model once on the entire dataset).
Batches are used because it is more computationally efficient to train your model on a batch than it is to train it on a single example. However, your data seems to be small enough that you won't have that problem. As such, I would recommend reducing your batch size to as low as possible. As a general rule, you get better training from a smaller batch size, at the cost of added computation.
If you post all of your code, I can take a look.

On loading the saved Keras sequential model, my test data gives low accuracy in the beginning

I am creating a simple sequential Keras model which will take 10k inputs in a batch of 100. Each input has 3 columns and the corresponding output is sum of that row.
Sequential model has 2 layers- LSTM(Stateful=true) , Dense.
Now, after compiling and fitting the model, I am saving it in 'model.h5' file.
Then, I read the saved model, and call model.predict with a test data (size=10k , batch_size = 100).
Problem: the prediction doesn't work properly for first 400-500 inputs and for the rest its working perfectly fine with very low val_loss.
Case1: I make the LSTM layer Stateless(i.e. Stateful=False)
In this case Keras is providing very accurate outputs for all the test data.
Case2: Instead of saving and then reading again, if I directly apply model.predict on the model created, all the outputs are coming accurately.
But, I need Stateful=True, also, I want to save my model and then resume work on that model later.
1.Is there any way to solve this?
2.Also, when I am providing test data, how is the model's accuracy increasing? ( because the first 400-500 tests provide inaccurate results and the rest are pretty accurate)
Your problem seems to come from losing the hidden states of your cells. During model building they might be reset and this might cause the problem.
So (it's a little bit cumbersome), but you could save and load also a states of your network:
How to save? (assuming that i-th layer is a recurrentone):
hidden_state = model.layers[i].states[0].eval()
cell_state = model.layers[i].states[0].eval()
numpy.save("some name", hidden_state)
numpy.save("some other name", cell_state)
Now when you can reload the hidden state, here you can read on how to set the hidden state in a layer.
Of course - it's the best to pack all of this methods in some kind of object and e.g. class constructor methods.

tensorflow repeated running of fully connected model

Question:
How can I "rerun" tensorflow code that depends on queues? Is the best way really to close the session, build the model again, load variables and run?
Motivation:
In a standing unanswered question I asked how in a fully connected model one could interleave actions (such as generating cumulative summaries, calc AUC on test data, etc.) with training that reads data from tensorflow TFRecords files and tf.Queues.
For example, tf.train.string_input_producer returns a filename_queue. As part of the constructor it takes a "num_epochs" arg. Instead of setting "num_epochs" to 100, I'm thinking to just set "num_epochs" to "2" to generate summaries every other epoch. This requires running the same code 50 times, hence the need for an efficient answer to above.