How to update Sagemaker Endpoint with the newly Trained Model? - amazon-web-services

I am able to train a model on Sagemaker and then deploy a model endpoint out of it.
Now, I want to retrain my model every week with the new data that is coming in. My question is - when I retrain the model how do I update my existing endpoint to use the latest model. (I don't want to deploy a new endpoint)
From some exploration, I think I can do it in 2 ways -
Near the end of the training job, I create a new EndpointConfig and later use UpdateEndpoint - The downside of this would be - I would end up with a lot of unnecessary Endpoint Configurations in my AWS Account? Or am I thinking about it wrongly?
Near the end of the training job, I deploy the trained model using .deploy() and set update_endpoint=True as illustrated in Sagemaker SDK Doc
I am not sure which is the better solution to accomplish this? Is there an even better way to do this?

If you are interested in doing this programmatically, use an AWS SDK (I will answer this assuming you are using Java.
Look the AWS SDK for Java V2 Javadocs. You can use the UpdateEndpoint to perform this use case. This method deploys the new EndpointConfig specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous EndpointConfig (there is no availability loss).

Related

AWS Sagemaker - Custom Training Job not saving Model output

I'm running a training job using AWS SageMaker and i'm using a custom Estimator based on an available docker image from AWS. I wanted to get some feedback on whether my process is correct or not prior to deployment.
I'm running the training job in a docker container using 'local' in a SageMaker notebook instance and the training job runs successfully. However, after the job completes and saves the model to opt/model/models within the docker image, once the docker container exits, the model saved from training is lost. Ideally, i'd like to use the model for inference, however, I'm not sure about the best way of doing it. I have also tried the training job after pushing the image to ECR, but the same thing happens.
It is my understanding that the docker state is lost, once the image exits, as such, is it possible to persist the model that was produced in training in the image? One option I have thought about is saving the model output to an S3 bucket once the training job is complete, then pulling that model into another docker image for inference. Is this expected behaviour and the correct way of doing it?
I am fairly new to using SageMaker but i'd like to do it according to best practices. I've looked at a lot of the AWS documents and followed the tutorials but it doesn't seem to mention explicitly if this is how it should be done.
Thanks for any feedback on this.
You can refer to Rok's comment on saving a model file when you're using a custom estimator. That said, SageMaker built-in estimators save the model artifacts to S3. To make inferences using that model, you can either use a real-time inference endpoint for real time predictions, or a batch transformer to run inferences in batch mode. In both cases, you'll have to point the configuration to the container for inference and the model artifacts. the amazon-sagemaker-examples repository has examples for common frameworks, especially, the scikit-learn example has detailed explanations.
Also, make sure the model is being saved to /opt/ml/model/, not opt/model/models as mentioned in your question.

Amazon SageMaker Model Registry / Pipelines - how to manually set a Stage for a given Model Version?

This might be a very specific question, but I will try anyway.
I want to explicitly set the Stage column in Model registry for a given Model Version:
This picture comes from the documentation and it gets set only when you run the example SageMaker Projects MLOps Templates they provide. When I create the Model Package (i.e. Model Version) manually, the column remains empty. How do I set it? What API do I call?
Additionally, the documentation on browsing the model version history has a following sentence
How do we send that exact event ("Deployed to stage XYZ") manually?
I already thoroughly went over all the files SageMaker MLOps Project generates (CodeBuild Builds, CodePipeline, CloudFormation, various .py files, SageMaker Pipeline) but could not find any direct and explicit call for that event.
I think it may be somehow connected to the Tag sagemaker:deployment-stage but I've already set it on Endpoint, EndpointConfiguration and Model, with no success. I also tried to blindly call the UpdateModelPackage API and set Stage in CustomerMetadataProperties. Again - no luck.
The only thing I get in that Activity tab is that given Model Version is deployed to Inference endpoint:
You can set the status with the ModelApprovalStatus parameter in the create_model_package API or the update_model_package API
Model package state change should create an event in EventBridge (like many other SageMaker events) https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html#eventbridge-model-package, which enables you to run the automation of your choice.
In the default SageMaker Pipelines Project template, you can see the EventBridge-driven proposed logic in the CodePipeline pipeline created for deployment: you can see on top "Trigger - CloudWatchEvent".
You don't see the event source as code in the git, because the status change is expected to be done in the Studio model registry UI in that demo template.
Those EventBridge events emitted by the Model Registry can also be seen in few blogs:
Taming Machine Learning on AWS with MLOps: A Reference Architecture
Patterns for multi-account, hub-and-spoke Amazon SageMaker model registry
Build MLOps workflows with Amazon SageMaker projects, GitLab, and GitLab pipelines
I was having the exact same issue, I wanted to change the model stage but could not find where it was being done in the sample code AWS provides.
After some research and looking into the sample code I realized that it was being done in the cloud formation execution. First they add the tag
'sagemaker:deployment-stage': stage_config['Parameters']['StageName']
and then the cloud formation execution (cfnUpdate call) updates the stage and deploys.
I couldn't find another way to change the state with a call to update_model_package or other methods.

Calling SageMaker Notebook instance function by endpoint

I am a newbie in AWS. Right now I have defined an image segmentation function in SageMaker notebook instance and this will return masks.
I didn't train my models there, what I have done is pip install models packages there, upload pre-trained weights manually. The rest is very similar to working in local machine: I imported package, load the weights, defined a function to take an image as input then outputs masks.
My question is: is there a way to host my function so that I can call it with URL endpoint + one image info, then it returns me masks in response?
Again I am so new to AWS and I begin to doubt SageMaker is not designed for this job... The reason I chose SageMaker is the need of computing capacity, I don't think I can do this job with pure lambda.
SageMaker inference endpoints currently rely on an interface based on Docker images. At the base level, you can set up a Docker image that runs a web server and responds to the endpoints on the ports that AWS require. This guide will show you how to do it: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html.
This is an annoying amount of work. If you're using a well-known framework they have a container library that contains some boilerplate code you might be able to reuse: https://github.com/aws/sagemaker-containers. You might have to do some customization there.
Or don't use SageMaker inference endpoints at all :) If your model can fit within the size / memory restrictions of AWS Lambda, that is an easier option!
Full disclaimer, I'm working on a platform that competes with SageMaker: Model Zoo

How to deploy a custom model in AWS SageMaker?

I have a custom machine learning predictive model. I also have a user defined Estimator class that uses Optuna for hyperparameter tuning. I need to deploy this model to SageMaker so as to invoke it from a lambda function.
I'm facing trouble in the process of creating a container for the model and the Estimator.
I am aware that SageMaker has a scikit learn container which can be used for Optuna, but how would I leverage this to include the functions from my own Estimator class? Also, the model is one of the parameters passed to this Estimator class so how do I define it as a separate training job in order to make it an Endpoint?
This is how the Estimator class and the model are invoked:
sirf_estimator = Estimator(
SIRF, ncov_df, population_dict[countryname],
name=countryname, places=[(countryname, None)],
start_date=critical_country_start
)
sirf_dict = sirf_estimator.run()
where:
Model Name : SIRF
Cleaned Dataset : ncov_df
Would be really helpful if anyone could look into this, thanks a ton!
The SageMaker inference endpoints currently rely on an interface based on Docker images. At the base level, you can set up a Docker image that runs a web server and responds to the endpoints on the ports that AWS require. This guide will show you how to do it: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html.
This is an annoying amount of work. If you're using a well-known framework they have a container library that contains some boilerplate code you might be able to reuse: https://github.com/aws/sagemaker-containers. You might be able to reuse some code from there, but customize it.
Or don't use SageMaker inference endpoints at all :) If your model can fit within the size / memory restrictions of AWS Lambda, that is an easier option!

Re-hosting a trained model on AWS SageMaker

I have started exploring AWS SageMaker starting with these examples provided by AWS. I then made some modifications to this particular setup so that it uses the data from my use case for training.
Now, as I continue to work on this model and tuning, after I delete the inference endpoint once, I would like to be able to recreate the same endpoint -- even after stopping and restarting the notebook instance (so the notebook / kernel session is no longer valid) -- using the already trained model artifacts that gets uploaded to S3 under /output folder.
Now I cannot simply jump directly to this line of code:
bt_endpoint = bt_model.deploy(initial_instance_count = 1,instance_type = 'ml.m4.xlarge')
I did some searching -- including amazon's own example of hosting pre-trained models, but I am a little lost. I would appreciate any guidance, examples, or documentation that I could emulate and adapt to my case.
Your comment is correct - you can re-create an Endpoint given an existing EndpointConfiguration. This can be done via the console, the AWS CLI, or the SageMaker boto client.
https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint