I have created a custom container for prediction and successfully uploaded the model to Vertex AI. I was also able to deploy the model to an endpoint and successfully request predictions from the endpoint. Within the custom container code, I use the parameters field as described here, which I then supply later on when making an online prediction request.
My questions are regarding requesting batch predictions from a custom container for prediction.
I cannot find any documentation that describes what happens when I request a batch prediction. Say, for example, I use the my_model.batch_predict function from the Python SDK and set the instances_format to "csv" and provide the gcs_source. Now, I have setup my custom container to expect prediction requests at /predict as described in this documentation. Does Vertex AI make a POST request to this path, converting the cvs data into the appropriate POST body?
How do I specify the parameters field for batch prediction as I did for online prediction?
Yes vertex AI makes a POST request your custom containers in batch prediction.
No, there is no way for batch prediction to pass a parameter since we don't know which column is "parameter". We put everything into "instances".
Related
I have trained my yolov5 model, and have weights.pt, now I need to deploy it using sagemaker, for that I need to create an endpoint.
I'm following this tutoriel https://sagemaker-examples.readthedocs.io/en/latest/frameworks/pytorch/get_started_mnist_deploy.html
Since I'm working using images I'm trying to customise input_fn,output_fn functions but unfortunately when I run inference I always get errors, my question is what logic should I follow in order to customise these functions ?
Each of these functions have a different purpose. Using the input_fn you want to prepare your input for what your model is expecting. Using the model_fn you want to load up your pytorch model. Using the predict_fn you want to have your predict/inference function code. Using output_fn you can shape the output that you return from your endpoint. Check out this article for an understanding of each of these handlers more in depth as well as examples: https://aws.plainenglish.io/adding-custom-inference-scripts-to-amazon-sagemaker-2208c3332510
Hi I have set up Recommendation AI with catalog data and user data.
User events are being fed from Google Tag Manager using 3 Recommendation AI tags.
1st Tag is for product page where I use ecommerce variable where I set up productid (this works as expected)
2nd Home page visit where I placed automl tag using custom html google tag with automl data with eventType: 'home-page-view' (this tag is just setting up the automl data layer) and second tag with Recommendation AI which is using automl data. (this doesn't work I don't see any other events than detailed-page-view in Recommender AI console)
As a side note I have added priority of firing tags to automl data layer 100 and the Recommendation AI left empty.
Anyone has any expirience with it?
To sum up I need to set up front-page-view and add-to-cart events but they are not being visible in Recommender AI just detailed-page-view
UPDATE
I'm still not able to receive the home-page-view event in the Recommendation AI using gtm.
I went extra mile and added new tag with custom html which creates automl variable in datalayer as per documentation (https://cloud.google.com/recommendations-ai/docs/user-events#tag-manager_10 - copy&paste from documentation):
The tag I have set up has 100 priority so it runs first.
Than there is a second tag for Recommedation AI where source of the data is set to AutoML (i have tried to override eventType variable with the Constant 'home-page-view' but it's not giving expected results either)
Running out of options here...
Update:
The event_type override should work now for your tag.
Original:
The event_type override on gtm is allow list only - we would need to allow list your gtm tag for it to work (if you provide it I can enable for your tag). Alternatively, you think you could set the eventType directly in the data layer?
I have an endpoint running a trained SageMaker model on AWS, which expects the data on a specific format.
Initially, the data has been processed on the client side of the application, it means, the API Gateway (which receives the POST API calls on AWS) used to receive pre-processed data, but now there's a change, the API Gateway will receive raw data from the client, and the job of pre-processing this data before sending to our SageMaker model is up to our workflow.
What is the best way to create a pre-processing job on this workflow, without needing to re-train the model? My pre-process is just a bunch of dataframe transformations, no standardization or calculation with the training set required (it would not need to save any model file).
Thanks!
After some research, this is the solution I've followed:
First I have created a SKLearn sagemaker model to do all the preprocess setup (I've built a Scikit-Learn custom class to handle all the preprocess steps, following this AWS code)
Trained this preprocess model on my training data. My model, in specific, didn't need to be trained (it does not have any standardization or anything that would need to store training data parameters), but sagemaker requires the model to be trained.
Loaded the trained legacy model that we had using the Model parameter.
Created a PipelineModel with the preprocessing model and legacy model in cascade:
pipeline_model = PipelineModel(name=model_name,
role=role,
models=[
preprocess_model,
trained_model
])
Create a new endpoint, calling the PipelineModel and then changed the Lambda function to call this new endpoint. With this I could send the raw data directly for the same API Gateway and it would call only one endpoint, without needing to pay two endpoints 24/7 to perform the entire process.
I've found this to be a good and "economic" way to perform the preprocess outside the trained model, without having to do hard processing jobs on a Lambda function.
I would create a Lambda, which is getting invoked by the API-Gateway, processing the data and sending it to your SageMaker endpoint.
I'm looking to make some hyper parameters available to the serving endpoint in SageMaker. The training instances is given access to input parameters using hyperparameters in:
estimator = TensorFlow(entry_point='autocat.py',
role=role,
output_path=params['output_path'],
code_location=params['code_location'],
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
training_steps=10000,
evaluation_steps=None,
hyperparameters=params)
However, when the endpoint is deployed, there is no way to pass in parameters that are used to control the data processing in the input_fn(serialized_input, content_type) function.
What would be the best way to pass parameters to the serving instance?? Is the source_dir parameter defined in the sagemaker.tensorflow.TensorFlow class copied to the serving instance? If so, I could use a config.yml or similar.
Ah i have had a similar problem to you where I needed to download something off S3 to use in the input_fn for inference. In my case it was a dictionary.
Three options:
use your config.yml approach, and download and import the s3 file from within your entrypoint file before any function declarations. This would make it available to the input_fn
Keep using the hyperparameter approach, download and import the vectorizer in serving_input_fn and make it available via a global variable so that input_fn has access to it.
Download the file from s3 before training and include it in the source_dir directly.
Option 3 would only work if you didnt need to make changes to the vectorizer seperately after initial training.
Whatever you do, don't download the file directly in input_fn. I made that mistake and the performance is terrible as each invoking of the endpoint would result in the s3 file being downloaded.
The Hyper-parameters are used in the training phase to allow you to tune (Hyper-Parameters Optimization - HPO) your model. Once you have a trained model, these hyper-parameters are not needed for inference.
When you want to pass features to the serving instances you usually do that in the BODY of each request to the invoke-endpoint API call (for example see here: https://docs.aws.amazon.com/sagemaker/latest/dg/tf-example1-invoke.html) or the call to the predict wrapper in the SageMaker python SDK (https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow). You can see such examples in the sample notebooks (https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_iris_byom/tensorflow_BYOM_iris.ipynb)
Yes, one option is to add your configuration file to source_dir and load the file in the input_fn.
Another option is to use serving_input_fn(hyperparameters). That function transforms the TensorFlow model in a TensorFlow serving model. For example:
def serving_input_fn(hyperparameters):
# gets the input shape from the hyperparameters
shape = hyperparameters.get('input_shape', [1, 7])
tensor = tf.placeholder(tf.float32, shape=shape)
# returns the ServingInputReceiver object.
return build_raw_serving_input_receiver_fn({INPUT_TENSOR_NAME: tensor})()
tensorflow amazon-sagemaker hyperparameters tensorflow-serving
I've submitted a training job to the cloud using the RESTful API and see in the console logs that it completed successfully. In order to deploy the model and use it for predictions I have saved the final model using tf.train.Saver().save() (according to the how-to guide).
When running locally, I can find the graph files (export-* and export-*.meta) in the working directory. When running on the cloud however, I don't know where they end up. The API doesn't seem to have a parameter for specifying this, it's not in the bucket with the trainer app, and I can't find any temporary buckets on the cloud storage created by the job.
When you set up your Cloud ML environment you set up a bucket for this purpose. Have you looked in there?
https://cloud.google.com/ml/docs/how-tos/getting-set-up
Edit (for future record): As Robert mentioned in comments, you'll want to pass the output location to the job as an argument. Couple of things to be mindful of:
Use a unique output location per job, so one job doesn't clobber over the outputs of another.
The recommendation is to specify the parent output path, and use it to contain the exported model in a subpath called 'model', as well as organizing other outputs like checkpoints and summaries within that path. That makes it easier to manage all the outputs.
While not required, I'll also suggest staging the training code in a packages subpath of the output, which helps correlate the source with the outputs it produces.
Finally(!), also keep in mind when you use hyperparameter tuning, you'll need to append the trial id to the output path for outputs produced by individual runs.