About autograd in pyorch, Adding new user-defined layers, how should I make its parameters update? - computer-vision

everyone !
My demand is a optical-flow-generating problem. I have two raw images and a optical flow data as ground truth, now my algorithm is to generate optical flow using raw images, and the euclidean distance between generating optical flow and ground truth could be defined as a loss value, so it can implement a backpropagation to update parameters.
I take it as a regression problem, and I have to ideas now:
I can set every parameters as (required_grad = true), and compute a loss, then I can loss.backward() to acquire the gradient, but I don’t know how to add these parameters in optimizer to update those.
I write my algorithm as a model. If I design a “custom” model, I can initilize several layers such as nn.Con2d(), nn.Linear() in def init() and I can update parameters in methods like (torch.optim.Adam(model.parameters())), but if I define new layers by myself, how should I add this layer’s parameters in updating parameter collection???
This problem has confused me several days. Are there any good methods to update user-defined parameters? I would be very grateful if you could give me some advice!

Tensor values have their gradients calculated if they
Have requires_grad == True
Are used to compute some value (usually loss) on which you call .backward().
The gradients will then be accumulated in their .grad parameter. You can manually use them in order to perform arbitrary computation (including optimization). The predefined optimizers accept an iterable of parameters and model.parameters() does just that - it returns an iterable of parameters. If you have some custom "free-floating" parameters you can pass them as
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(my_params)
and you can also merge them with the other parameter iterables like below:
model_params = list(model.parameters())
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(model_params + my_params)
In practice however, you can usually structure your code to avoid that. There's the nn.Parameter class which wraps tensors. All subclasses of nn.Module have their __setattr__ overridden so that whenever you assign an instance of nn.Parameter as its property, it will become a part of Module's .parameters() iterable. In other words
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.my_param_1 = nn.Parameter(torch.tensor(...))
self.my_param_2 = nn.Parameter(torch.tensor(...))
will allow you to write
module = MyModule()
optim = torch.optim.Adam(module.parameters())
and have the optim update module.my_param_1 and module.my_param_2. This is the preferred way to go, since it helps keep your code more structured
You won't have to manually include all your parameters when creating the optimizer
You can call module.zero_grad() and zero out the gradient on all its children nn.Parameters.
You can call methods such as module.cuda() or module.double() which, again, work on all children nn.Parameters instead of requiring to manually iterate through them.

Related

Mxnet with symbol API: batch normalization update

I am currently training a convolutional neural network using Mxnet, with the C++ Symbol API. This network contains some Batchnormalization layers, which contains the four parameter NDArray. Two of them, the moving_mean and moving_variance parameter are supposed to be updated at every batch during the training.
I was guessing that, since the boolean for the forward pass of the executor is set to true, it would update automatically the new parameters. However, for some reasons, these two NDArray remains still, without any update of the parameter. How so? Besides, since there are no gradients computed for these two NDArray, because it is not "learnable" parameters, I have no way to update the values through the regular optimizer update function. How to tell Mxnet, using the symbol API, to update the moving_mean and moving_variance NDArrays?
moving_mean and moving_variance are updated during the backward pass of training, rather than during the optimization step like other parameters. One other reason these parameters could remain fixed during training is if you've set use_global_stats=True on the BatchNorm layer.

Pass Parameter to GCMLE Prediction Graph

Ror my ML Engine Prediction Graph, I have a part of the graph which takes a long time to compute and is not always necessary. Is there a way to create a boolean flag that will skip over this section of the graph? I would like to pass this flag when creation a batch predict job or an online prediction. For example, it would be something like:
gcloud ml-engine predict --model $MODEL --version $VERSION --json-instance $JSON_INSTANCES --boolean_flag $BOOLEAN_FLAG
In the example above, I would either pass True/False as the $BOOLEAN_FLAG and then this would determine whether a part of the prediction graph is evaluated. I would imagine that this flag could also be passed in the body of the batch prediction job, just like model/version are. Is this at all possible?
I know that I could add a new input field to the prediction request that is True/False for each element in the batch and just pass that as False when I don't want to obtain the prediction, but I'm curious if there is a way to do this with just a single parameter.
This is not currently possible. We'd like to hear more about your requirements for this feature. Please reach out to us at cloudml-feedback#google.com
How about adding two different export signatures, each with a different head? Then you can deploy to two different endpoints? Choose the url to call depending on whether you want full or partial.
Write two serving input functions, one for each case. In the first case, set the flag to zero, and in the second case, set the flag to one. The reason to use ones_like and zeros_like is to ensure that you have a batch of zeros and ones:
def case1_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.zeros_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
def case2_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.ones_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
In your train_and_evaluate function, have two exporters:
def train_and_evaluate(output_dir, nsteps):
...
exporter1 = tf.estimator.LatestExporter('case1', case1_serving_input_fn)
exporter2 = tf.estimator.LatestExporter('case2', case2_serving_input_fn)
eval_spec=tf.estimator.EvalSpec(
input_fn = make_input_fn(eval_df, 1),
exporters = [exporter1, exporter2] )
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

When training a single batch, is iteration of examples necessary (optimal) in python code?

Say I have one batch that I want to train my model on. Do I simply run tf.Session()'s sess.run(batch) once, or do I have to iterate through all of the batch's examples with a loop in the session? I'm looking for the optimal way to iterate/update the training ops, such as loss. I thought tensorflow would handle it itself, especially in the cases where tf.nn.dynamic_rnn() takes in a batch dimension for listing the examples. I thought, perhaps naively, that a for loop in the python code would be the inefficient method of updating the loss. I am using tf.losses.mean_squared_error(batch) for a regression problem.
My regression problem is given two lists of word vectors (300d each), and determines the similarity between the two lists on a continuous scale from [0, 5]. My supervised model is Deepmind's Differential Neural Computer (DNC). The problem is I do not believe it is learning anything. this is due to the fact that the all of the output from the model is centered around 0 and even negative. I do not know how it could possibly be negative given no negative labels provided. I only call sess.run(loss) for the single batch, I do not create a python loop to iterate through it.
So, what is the most efficient way to iterate the training of a model and how do people go about it? Do they really use python loops to do multiple calls to sess.run(loss) (this was done in the training file example for DNC, and I have seen it in other examples as well). I am certain I get the final loss from the below process, but I am uncertain if the model has actually been trained entirely just because the loss was processed in one go. I also do not understand the point of update_ops returned by some functions, and am uncertain if they are necessary to ensure the model has been trained.
Example of what I mean by processing a batch's loss once:
# assume the model has been defined prior through batch_output_logits
train_loss = tf.losses.mean_squared_error(labels=target,
predictions=batch_output_logits)
with tf.Session() as sess:
sess.run(init_op) # pseudo code, unnecessary for question
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# is this the entire batch's loss && model has been trained for that batch?
loss_np = sess.run(train_step, train_loss)
coord.request_stop()
coord.join(threads)
Any input on why I am receiving negative values when the labels are in the range [0, 5] is welcomed as well(general abstract answers for this are fine, because its not the main focus). I am thinking of attempting to create a piece-wise function, if possible, for my loss, so that for any values out of bounds face a rapidly growing exponential loss function. Uncertain how to implement, or if it would even work.
Code is currently private. Once allowed, I will make the repo public.
To run DNC model, go to the project/ directory and run python -m src.main. If there are errors you encounter feel free to let me know.
This model depends upon Tensorflow r1.2, most recent Sonnet, and NLTK's punkt for Tokenizing sentences in sts_handler.py and tests/*.
In a regression model, the network calculates the model output based on the randomly initialized values for your model parameters. That's why you're seeing negative values here; you haven't trained your model enough for it to learn that your values are only between 0 and 5.
Unless I'm missing something, you are only calculating the loss, but you aren't actually training the model. You should probably be calling sess.run(optimizer) on an optimizer, not on your loss function.
You probably need to train your model for multiple epochs (training your model for one epoch = training your model once on the entire dataset).
Batches are used because it is more computationally efficient to train your model on a batch than it is to train it on a single example. However, your data seems to be small enough that you won't have that problem. As such, I would recommend reducing your batch size to as low as possible. As a general rule, you get better training from a smaller batch size, at the cost of added computation.
If you post all of your code, I can take a look.

On loading the saved Keras sequential model, my test data gives low accuracy in the beginning

I am creating a simple sequential Keras model which will take 10k inputs in a batch of 100. Each input has 3 columns and the corresponding output is sum of that row.
Sequential model has 2 layers- LSTM(Stateful=true) , Dense.
Now, after compiling and fitting the model, I am saving it in 'model.h5' file.
Then, I read the saved model, and call model.predict with a test data (size=10k , batch_size = 100).
Problem: the prediction doesn't work properly for first 400-500 inputs and for the rest its working perfectly fine with very low val_loss.
Case1: I make the LSTM layer Stateless(i.e. Stateful=False)
In this case Keras is providing very accurate outputs for all the test data.
Case2: Instead of saving and then reading again, if I directly apply model.predict on the model created, all the outputs are coming accurately.
But, I need Stateful=True, also, I want to save my model and then resume work on that model later.
1.Is there any way to solve this?
2.Also, when I am providing test data, how is the model's accuracy increasing? ( because the first 400-500 tests provide inaccurate results and the rest are pretty accurate)
Your problem seems to come from losing the hidden states of your cells. During model building they might be reset and this might cause the problem.
So (it's a little bit cumbersome), but you could save and load also a states of your network:
How to save? (assuming that i-th layer is a recurrentone):
hidden_state = model.layers[i].states[0].eval()
cell_state = model.layers[i].states[0].eval()
numpy.save("some name", hidden_state)
numpy.save("some other name", cell_state)
Now when you can reload the hidden state, here you can read on how to set the hidden state in a layer.
Of course - it's the best to pack all of this methods in some kind of object and e.g. class constructor methods.