Ror my ML Engine Prediction Graph, I have a part of the graph which takes a long time to compute and is not always necessary. Is there a way to create a boolean flag that will skip over this section of the graph? I would like to pass this flag when creation a batch predict job or an online prediction. For example, it would be something like:
gcloud ml-engine predict --model $MODEL --version $VERSION --json-instance $JSON_INSTANCES --boolean_flag $BOOLEAN_FLAG
In the example above, I would either pass True/False as the $BOOLEAN_FLAG and then this would determine whether a part of the prediction graph is evaluated. I would imagine that this flag could also be passed in the body of the batch prediction job, just like model/version are. Is this at all possible?
I know that I could add a new input field to the prediction request that is True/False for each element in the batch and just pass that as False when I don't want to obtain the prediction, but I'm curious if there is a way to do this with just a single parameter.
This is not currently possible. We'd like to hear more about your requirements for this feature. Please reach out to us at cloudml-feedback#google.com
How about adding two different export signatures, each with a different head? Then you can deploy to two different endpoints? Choose the url to call depending on whether you want full or partial.
Write two serving input functions, one for each case. In the first case, set the flag to zero, and in the second case, set the flag to one. The reason to use ones_like and zeros_like is to ensure that you have a batch of zeros and ones:
def case1_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.zeros_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
def case2_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.ones_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
In your train_and_evaluate function, have two exporters:
def train_and_evaluate(output_dir, nsteps):
...
exporter1 = tf.estimator.LatestExporter('case1', case1_serving_input_fn)
exporter2 = tf.estimator.LatestExporter('case2', case2_serving_input_fn)
eval_spec=tf.estimator.EvalSpec(
input_fn = make_input_fn(eval_df, 1),
exporters = [exporter1, exporter2] )
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
Related
I am in a middle of migrating some pipelines to airflow. I want to be able to run some DAGs on for specific time ranges for historical loads and I am exploring my options. Note: I don't want to re-execute previous runs (for example past 10 days) but I want to be able to reload data based on a last_loaded timestamp variable (e.g. 2017-12-09 00:00:00.000000) anytime I need (even before DAGs were created). This variable is also used externally to call APIs.
In total there are 4 concepts in my mind:
The current dag run implements exchanging this variable by xcom table in metadata db. Although every time I want to modify it I have to update a fields which data type is blob. I am not even sure if this is possible.
Keep this parameter somewhere else. Easy solution to implement but I don't want to reinvent wheel. If there is always some functionality implemented by airflow I would like to explore it.
Airflow variables: so far maybe no the most approved concept of airflow but I do feel that this is what I want.
Backfill: if I am not mistaken this is attached to the previous executions. So, if my dag started running daily at December I won't be able to load data from August.
Any advice please?
For this use case you can process the ETL as follows:
Read last last_loaded value from Variable.
Run ETL between last_loaded to current_timestamp or execution_date or whatever higher boundary of your choice.
Store the higher boundary into the Variable.
A skeleton overview could be:
def set_dag_variables(**kwargs):
new_value = kwargs['var_value']
Variable.set(key=DAG_ID, value=new_value, serialize_json=True)
last_loaded = Varible.get(key=var_name) # don't do this in production. Use macro instead.
your_higher_boundary_param = datetime.now(tz=None)
op1 = YourOperaror(
task_id='op1_task',
params = {"param1":last_loaded,
param2: your_higher_boundary_param }
)
op2 = PythonOperator(
task_id='set_dag_variable_task',
provide_context=True,
python_callable=set_dag_variables,
op_kwargs={'var_value': your_higher_boundary_param}
)
op1 >> op2
Note: this is very high level and the details do matter!
For example I used Varible.get outside of operator/macro scope which is a bad practice. The proper way is to use macro but I simplified it for the propose of the example.
We have a huge set of data in CSV format, containing a few numeric elements, like this:
Year,BinaryDigit,NumberToPredict,JustANumber, ...other stuff
1954,1,762,16, ...other stuff
1965,0,142,16, ...other stuff
1977,1,172,16, ...other stuff
The thing here is that there is a strong correlation between the third column and the columns before that. So I have pre-processed the data and it's now available in a format I think is perfect:
1954,1,762
1965,0,142
1977,1,172
What I want is a predicition on the value in the third column, using the first two as input. So in the case above, I want the input 1965,0 to return 142. In real life this file is thousands of rows, but since there's a pattern, I'd like to retrieve the most possible value.
So far I've setup a train job on the CSV file using the Linear Learner algorithm, with the following settings:
label_size = 1
feature_dim = 2
predictor_type = regression
I've also created a model from it, and setup an endpoint. When I invoke it, I get a score in return.
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='text/csv',
Body=payload)
My goal here is to get the third column prediction instead. How can I achieve that? I have read a lot of the documentation regarding this, but since I'm not very familiar with AWS, I might as well have used the wrong algorithms for what I am trying to do.
(Please feel free to edit this question to better suit AWS terminology)
For csv input, the label should be in the first column, as mentioned here: So you should preprocess your data to put the label (the column you want to predict) on the left.
Next, you need to decide whether this is a regression problem or a classification problem.
If you want to predict a number that's as close as possible to the true number, that's regression. For example, the truth might be 4, and the model might predict 4.15. If you need an integer prediction, you could round the model's output.
If you want the prediction to be one of a few categories, then you have a classification problem. For example, we might encode 'North America' = 0, 'Europe' = 1, 'Africa' = 2, and so on. In this case, a fractional prediction wouldn't make sense.
For regression, use 'predictor_type' = 'regressor' and for classification with more than 2 classes, use 'predictor_type' = 'multiclass_classifier' as documented here.
The output of regression will contain only a 'score' field, which is the model's prediction. The output of multiclass classification will contain a 'predicted_label' field, which is the model's prediction, as well as a 'score' field, which is a vector of probabilities representing the model's confidence. The index with the highest probability will be the one that's predicted as the 'predicted_label'. The output formats are documented here.
predictor_type = regression is not able to return the predicted label, according to
the linear-learner documentation:
For inference, the linear learner algorithm supports the application/json, application/x-recordio-protobuf, and text/csv formats. For binary classification models, it returns both the score and the predicted label. For regression, it returns only the score.
For more information on input and output file formats, see Linear
Learner Response Formats for inference, and the Linear Learner Sample
Notebooks.
everyone !
My demand is a optical-flow-generating problem. I have two raw images and a optical flow data as ground truth, now my algorithm is to generate optical flow using raw images, and the euclidean distance between generating optical flow and ground truth could be defined as a loss value, so it can implement a backpropagation to update parameters.
I take it as a regression problem, and I have to ideas now:
I can set every parameters as (required_grad = true), and compute a loss, then I can loss.backward() to acquire the gradient, but I don’t know how to add these parameters in optimizer to update those.
I write my algorithm as a model. If I design a “custom” model, I can initilize several layers such as nn.Con2d(), nn.Linear() in def init() and I can update parameters in methods like (torch.optim.Adam(model.parameters())), but if I define new layers by myself, how should I add this layer’s parameters in updating parameter collection???
This problem has confused me several days. Are there any good methods to update user-defined parameters? I would be very grateful if you could give me some advice!
Tensor values have their gradients calculated if they
Have requires_grad == True
Are used to compute some value (usually loss) on which you call .backward().
The gradients will then be accumulated in their .grad parameter. You can manually use them in order to perform arbitrary computation (including optimization). The predefined optimizers accept an iterable of parameters and model.parameters() does just that - it returns an iterable of parameters. If you have some custom "free-floating" parameters you can pass them as
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(my_params)
and you can also merge them with the other parameter iterables like below:
model_params = list(model.parameters())
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(model_params + my_params)
In practice however, you can usually structure your code to avoid that. There's the nn.Parameter class which wraps tensors. All subclasses of nn.Module have their __setattr__ overridden so that whenever you assign an instance of nn.Parameter as its property, it will become a part of Module's .parameters() iterable. In other words
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.my_param_1 = nn.Parameter(torch.tensor(...))
self.my_param_2 = nn.Parameter(torch.tensor(...))
will allow you to write
module = MyModule()
optim = torch.optim.Adam(module.parameters())
and have the optim update module.my_param_1 and module.my_param_2. This is the preferred way to go, since it helps keep your code more structured
You won't have to manually include all your parameters when creating the optimizer
You can call module.zero_grad() and zero out the gradient on all its children nn.Parameters.
You can call methods such as module.cuda() or module.double() which, again, work on all children nn.Parameters instead of requiring to manually iterate through them.
I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.
The training data is read from two .npy files. Say, train_set is regarded as X, and train_label is regarded as Y. Therefore, it is not a multiple input case. My task requires to augment the image patches in different manner. So how to define different Image Generator for different patches? Although there could be a lot of patches, I use 3 patches as an example:
for patch1:
datagen = ImageDataGenerator(rotation_range=20)
for patch2:
datagen = ImageDataGenerator(rotation_range=40)
for patch3:
datagen = ImageDataGenerator(rotation_range=60)
How to apply different generators on different patches, and how may I use the model.fit(...) or model.fit_generator(...) for the described scenario?
Also, Is there a way to rotate the image by a particular degree instead of a range?
Thanks!
I didn't do it myself, but I think one approach is to use the first datagen and pass the first group of training data with fit_generator and with the selected number of epochs. Then, save weight and use the second datagen and the second group with fit_generator. You also need to set initial_epoch and also need to load the weights. To generalize the question, what you need to do is to resume training with the second datagen. Please see https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model.