Mxnet with symbol API: batch normalization update - c++

I am currently training a convolutional neural network using Mxnet, with the C++ Symbol API. This network contains some Batchnormalization layers, which contains the four parameter NDArray. Two of them, the moving_mean and moving_variance parameter are supposed to be updated at every batch during the training.
I was guessing that, since the boolean for the forward pass of the executor is set to true, it would update automatically the new parameters. However, for some reasons, these two NDArray remains still, without any update of the parameter. How so? Besides, since there are no gradients computed for these two NDArray, because it is not "learnable" parameters, I have no way to update the values through the regular optimizer update function. How to tell Mxnet, using the symbol API, to update the moving_mean and moving_variance NDArrays?

moving_mean and moving_variance are updated during the backward pass of training, rather than during the optimization step like other parameters. One other reason these parameters could remain fixed during training is if you've set use_global_stats=True on the BatchNorm layer.

Related

Why does dlib's neural net xml export contain different parameters for layers than specified by the trainer?

In DLib it is possible to simply output a neural net type via the dlib::net_to_xml(some_net, some_filename) function. It works fine, but it also displays information like for example the net type, learning_rate_multi. In my case for example it exports the following line for one of the layers (the rest of the exported xml is omitted for clarity):
<fc num_outputs='42' learning_rate_mult='1' weight_decay_mult='1' bias_learning_rate_mult='500' bias_weight_decay_mult='0'>
Those values are correct, except for learning_rate_mult and weight_decay_mult, which always show 1. I tried setting them to different values with the trainer class, like 2 or 0.0001, but they keep showing 1. I verified that the values 2 and 0.0001 indeed were used by the net.
Might this be a bug in dlib's dlib::net_to:xml function?
Those values apply for every layer and are independent from the trainer-values. The layer parameters are relevant for optimzers like the Adam Optimization Algorithm:
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
you can change them by specifyng them in every layer.
So no it is not a bug.

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

How to save a model in tensorflow by using c++

How to save a model in Tensorflow by using c++? I have searched on google and baidu but not find any solutions for it. I then reading the api document of tensorflow, and the introduce is fewer introduction about C++
Model saving is implemented in Python only. There is currently no way to save a model using C++ APIs. C++ APIs allow you to load and use the models, not to train or save them.
Assume you have basic understanding of tensorflow C++ API and know how to construct a graph using the C++ API. You can make use of the 2 functions :
tensorflow::WriteTextProto() : your can get tensorflow::GraphDef (that represents all the operations you defined e.g. Add, multiply, Mean .... etc ) from tensorflow::Scope::ToGraphDef(), save the tensorflow::GraphDef to text protobuf file
tensorflow::checkpoint::TensorSliceWriter saves the current state of parameter matrices to external file (checkpoint), it's little complicated but it works well for me
firstly you'll have to get trained parameter to by calling tensorflow::Session::Run, which will return a list of parameter matrices to output_tensor (see sample below) :
std::vector<tensorflow::Tensor> output_tensor;
tensorflow::Session::Run({}, {"name_of_param_mtx_1", "name_of_param_mtx_2",}, {}, &output_tensor);
where the name_of_param_mtx_1 and name_of_param_mtx_2 above should be the name of your parameter matrices in tensorflow::Variable, e.g.
auto name_of_param_mtx_1 = tensorflow::ops::Variable (root.WithOpName("name_of_param_mtx_1"), {7, 17}, tensorflow::DT_FLOAT);
then you need to prepare following for tensorflow::checkpoint::TensorSliceWriter:
base address of the parameter raw data by calling tensorflow::Tensor.tensor_data().data()
shape of each tensorflow::Tensor , by calling tensorflow::Tensor::dim_size(NUM_DIMENSION). For eaxmple a 7x17 2D parameter matrix, NUM_DIMENSION can be 0 and 1, where tensorflow::Tensor::dim_size(0) is 7 and tensorflow::Tensor::dim_size(1) is 17.
name of this checkpoint, the name must be unique from other checkpoints in one file
create tensorflow::TensorSlice by calling tensorflow::TensorSlice::ParseOrDie("-:-"), it seems that the only argument of tensorflow::TensorSlice::ParseOrDie will be internally analyzed e.g. -:- means taking all items of a matrix. if users only want part of trained parameter matrix e.g. to only take 2nd column of all rows, then the string argument would be likely -:2 , I haven't figured out such advanced usage of tensorflow::TensorSlice::ParseOrDie.
Hope that helps.

tensorflow repeated running of fully connected model

Question:
How can I "rerun" tensorflow code that depends on queues? Is the best way really to close the session, build the model again, load variables and run?
Motivation:
In a standing unanswered question I asked how in a fully connected model one could interleave actions (such as generating cumulative summaries, calc AUC on test data, etc.) with training that reads data from tensorflow TFRecords files and tf.Queues.
For example, tf.train.string_input_producer returns a filename_queue. As part of the constructor it takes a "num_epochs" arg. Instead of setting "num_epochs" to 100, I'm thinking to just set "num_epochs" to "2" to generate summaries every other epoch. This requires running the same code 50 times, hence the need for an efficient answer to above.

Do OpenCV's machine learning algorithms continuously update a model?

Most machine learning algorithms implemented in OpenCV 2.4 built upon a CvStatModel which comes with a CvStatModel::train method.
There it says:
By default, the input feature vectors are stored as train_data rows, that is, all the components (features) of a training vector are stored continuously.
and
Usually, the previous model state is cleared by CvStatModel::clear() before running the training procedure. However, some algorithms may optionally update the model state with the new training data, instead of resetting it.
How do I know which ml algorithm isn't resetting the current model state. Since I wanted to use CvGBTrees::train which has a update parameter declared as being only a dummy parameter, I guess the model is discarded after every training call. Can I take it that if there is no such update parameter the current model state will always be discarded?
I need a machine learning algorithm which continuously trains one model and doesn't start with an initial model every training call.
Is this doable with the current ml implementations in OpenCV? and if so with which ones? Furthermore, if not are there other c++ libraries that would do so?