I am trying to perform load test on sagemaker multimodel endpoint.I found out that we can't run an instance recommendation job for an existing endpoint from this link https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-existing-endpoint.html but in the inference recommendation job for without endpoint documentation this is not mentioned.So I just wanted to know how can we run inference recommendation job for multi model endpoint.And since the inference recommendation job will create and invoke endpoint on its side where can i provide extra parameters required for multi model endpoint
I checked this links
https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-existing-endpoint.html
https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-instance-recommendation.html
Related
I am able to train a model on Sagemaker and then deploy a model endpoint out of it.
Now, I want to retrain my model every week with the new data that is coming in. My question is - when I retrain the model how do I update my existing endpoint to use the latest model. (I don't want to deploy a new endpoint)
From some exploration, I think I can do it in 2 ways -
Near the end of the training job, I create a new EndpointConfig and later use UpdateEndpoint - The downside of this would be - I would end up with a lot of unnecessary Endpoint Configurations in my AWS Account? Or am I thinking about it wrongly?
Near the end of the training job, I deploy the trained model using .deploy() and set update_endpoint=True as illustrated in Sagemaker SDK Doc
I am not sure which is the better solution to accomplish this? Is there an even better way to do this?
If you are interested in doing this programmatically, use an AWS SDK (I will answer this assuming you are using Java.
Look the AWS SDK for Java V2 Javadocs. You can use the UpdateEndpoint to perform this use case. This method deploys the new EndpointConfig specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous EndpointConfig (there is no availability loss).
I have a usecase for hosting multiple xgboost models in one sagemaker endpoint. The models have a slightly different feature set and preprocessing for features.
The two options I am considering are:
Creating models with custom docker images and hosting them in one endpoint using production variants. I will then invoke the endpoint with the variant name and correct feature set.
Sagemaker Inference Toolkit (multi-model-server). In the handler script I am planning to pre-process the input differently based on the model name.
Are these the right approach for the problem? Or is there a better approach for working with Sagemaker and multiple xgboost model with pre and post processing?
I have a custom machine learning predictive model. I also have a user defined Estimator class that uses Optuna for hyperparameter tuning. I need to deploy this model to SageMaker so as to invoke it from a lambda function.
I'm facing trouble in the process of creating a container for the model and the Estimator.
I am aware that SageMaker has a scikit learn container which can be used for Optuna, but how would I leverage this to include the functions from my own Estimator class? Also, the model is one of the parameters passed to this Estimator class so how do I define it as a separate training job in order to make it an Endpoint?
This is how the Estimator class and the model are invoked:
sirf_estimator = Estimator(
SIRF, ncov_df, population_dict[countryname],
name=countryname, places=[(countryname, None)],
start_date=critical_country_start
)
sirf_dict = sirf_estimator.run()
where:
Model Name : SIRF
Cleaned Dataset : ncov_df
Would be really helpful if anyone could look into this, thanks a ton!
The SageMaker inference endpoints currently rely on an interface based on Docker images. At the base level, you can set up a Docker image that runs a web server and responds to the endpoints on the ports that AWS require. This guide will show you how to do it: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html.
This is an annoying amount of work. If you're using a well-known framework they have a container library that contains some boilerplate code you might be able to reuse: https://github.com/aws/sagemaker-containers. You might be able to reuse some code from there, but customize it.
Or don't use SageMaker inference endpoints at all :) If your model can fit within the size / memory restrictions of AWS Lambda, that is an easier option!
I am aware that it is possible to deploy custom containers for training jobs on google cloud and I have been able to get the same running using command.
gcloud ai-platform jobs submit training infer name --region some_region --master-image-uri=path/to/docker/image --config config.yaml
The training job was completed successfully and the model was successfully obtained, Now I want to use this model for inference, but the issue is a part of my code has system level dependencies, so I have to make some modification into the architecture in order to get it running all the time. This was the reason to have a custom container for the training job in the first place.
The documentation is only available for the training part and the inference part, (if possible) with custom containers has not been explored to the best of my knowledge.
The training part documentation is available on this link
My question is, is it possible to deploy custom containers for inference purposes on google cloud-ml?
This response refers to using Vertex AI Prediction, the newest platform for ML on GCP.
Suppose you wrote the model artifacts out to cloud storage from your training job.
The next step is to create the custom container and push to a registry, by following something like what is described here:
https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements
This section describes how you pass the model artifact directory to the custom container to be used for interence:
https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#artifacts
You will also need to create an endpoint in order to deploy the model:
https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api#aiplatform_deploy_model_custom_trained_model_sample-gcloud
Finally, you would use gcloud ai endpoints deploy-model ... to deploy the model to the endpoint:
https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model
I have started exploring AWS SageMaker starting with these examples provided by AWS. I then made some modifications to this particular setup so that it uses the data from my use case for training.
Now, as I continue to work on this model and tuning, after I delete the inference endpoint once, I would like to be able to recreate the same endpoint -- even after stopping and restarting the notebook instance (so the notebook / kernel session is no longer valid) -- using the already trained model artifacts that gets uploaded to S3 under /output folder.
Now I cannot simply jump directly to this line of code:
bt_endpoint = bt_model.deploy(initial_instance_count = 1,instance_type = 'ml.m4.xlarge')
I did some searching -- including amazon's own example of hosting pre-trained models, but I am a little lost. I would appreciate any guidance, examples, or documentation that I could emulate and adapt to my case.
Your comment is correct - you can re-create an Endpoint given an existing EndpointConfiguration. This can be done via the console, the AWS CLI, or the SageMaker boto client.
https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint