We're trying to deploy a custom optimizer model into SageMaker. Our model consists of a number of .py files distributed across the repo and some external lib dependencies like ortools. Input CSV files can be put into a S3 bucket. Output of our model is a pickle file which is based on Input CSV files (these will be different each time someone runs a job).
We would prefer not to use ECR but if there's no other way option then can we follow the link below in order to achieve what we're aiming for? This sagemaker endpoint is expected to be called from a stepfunction.
https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html
I'd encourage you to check out the examples here for BYOC deployment.
Would require more information particularly on the framework and model to suggest further.
Related
I'm running a training job using AWS SageMaker and i'm using a custom Estimator based on an available docker image from AWS. I wanted to get some feedback on whether my process is correct or not prior to deployment.
I'm running the training job in a docker container using 'local' in a SageMaker notebook instance and the training job runs successfully. However, after the job completes and saves the model to opt/model/models within the docker image, once the docker container exits, the model saved from training is lost. Ideally, i'd like to use the model for inference, however, I'm not sure about the best way of doing it. I have also tried the training job after pushing the image to ECR, but the same thing happens.
It is my understanding that the docker state is lost, once the image exits, as such, is it possible to persist the model that was produced in training in the image? One option I have thought about is saving the model output to an S3 bucket once the training job is complete, then pulling that model into another docker image for inference. Is this expected behaviour and the correct way of doing it?
I am fairly new to using SageMaker but i'd like to do it according to best practices. I've looked at a lot of the AWS documents and followed the tutorials but it doesn't seem to mention explicitly if this is how it should be done.
Thanks for any feedback on this.
You can refer to Rok's comment on saving a model file when you're using a custom estimator. That said, SageMaker built-in estimators save the model artifacts to S3. To make inferences using that model, you can either use a real-time inference endpoint for real time predictions, or a batch transformer to run inferences in batch mode. In both cases, you'll have to point the configuration to the container for inference and the model artifacts. the amazon-sagemaker-examples repository has examples for common frameworks, especially, the scikit-learn example has detailed explanations.
Also, make sure the model is being saved to /opt/ml/model/, not opt/model/models as mentioned in your question.
I want to run batch predictions inside Google Cloud's vertex.ai using a custom trained model. I was able to find documentation to get online prediction working with a custom built docker image by setting up an endpoint, but I can't seem to find any documentation on what the Dockerfile should be for batch prediction. Specifically how does my custom code get fed the input and where does it put the output?
The documentation I've found is here, it certainly looks possible to use a custom model and when I tried it didn't complain, but eventually it did throw an error. According to the documentation no endpoint is required for running batch jobs.
I am using AWS Sagemaker to deploy my speech models trained outside of Sagemaker. I am able to convert my model into something Sagemaker would understand and have deployed it as an endpoint. Problem is that Sagemaker directly loads the model and calls .predict to get the inference. I am unable to figure out where can I add my preprocessing functions in the deployed model. It is suggested to use AWS Lambda or another server for preprocessing. Is there any way I can incorporate complex preprocessing (cannot be done by simple Scikit, Pandas like framework) in Sagemaker itself?
You will want to adjust the predictor.py file in the container that you are bringing your speech models in. Assuming you are using Bring Your Container to deploy these models on SageMaker you will want to adjust the predictor code to include the preprocessing functionality that you are working with. For any extra dependencies that you are working with make sure to update this in your Dockerfile that you are bringing. Having the preprocessing functionality within the predictor file will make sure your data is transformed, processed as you desire before returning predictions. This will add to the response time however, so if you have heavy preprocessing workloads or ETL that needs to occur you may want to look into a service as AWS Glue (ETL) or Kinesis (real-time data streaming/data transformation). If you choose to use Lambda you want to keep in mind the 15 minute timeout limit.
I work for AWS & my opinions are my own
I have a model.pkl file which is pre-trained and all other files related to the ml model. I want it to deploy it on the aws sagemaker.
But without training, how to deploy it to the aws sagmekaer, as fit() method in aws sagemaker run the train command and push the model.tar.gz to the s3 location and when deploy method is used it uses the same s3 location to deploy the model, we don't manual create the same location in s3 as it is created by the aws model and name it given by using some timestamp. How to put out our own personalized model.tar.gz file in the s3 location and call the deploy() function by using the same s3 location.
All you need is:
to have your model in an arbitrary S3 location in a model.tar.gz archive
to have an inference script in a SageMaker-compatible docker image that is able to read your model.pkl, serve it and handle inferences.
to create an endpoint associating your artifact to your inference code
When you ask for an endpoint deployment, SageMaker will take care of downloading your model.tar.gz and uncompressing to the appropriate location in the docker image of the server, which is /opt/ml/model
Depending on the framework you use, you may use either a pre-existing docker image (available for Scikit-learn, TensorFlow, PyTorch, MXNet) or you may need to create your own.
Regarding custom image creation, see here the specification and here two examples of custom containers for R and sklearn (the sklearn one is less relevant now that there is a pre-built docker image along with a sagemaker sklearn SDK)
Regarding leveraging existing containers for Sklearn, PyTorch, MXNet, TF, check this example: Random Forest in SageMaker Sklearn container. In this example, nothing prevents you from deploying a model that was trained elsewhere. Note that with a train/deploy environment mismatch you may run in errors due to some software version difference though.
Regarding your following experience:
when deploy method is used it uses the same s3 location to deploy the
model, we don't manual create the same location in s3 as it is created
by the aws model and name it given by using some timestamp
I agree that sometimes the demos that use the SageMaker Python SDK (one of the many available SDKs for SageMaker) may be misleading, in the sense that they often leverage the fact that an Estimator that has just been trained can be deployed (Estimator.deploy(..)) in the same session, without having to instantiate the intermediary model concept that maps inference code to model artifact. This design is presumably done on behalf of code compacity, but in real life, training and deployment of a given model may well be done from different scripts running in different systems. It's perfectly possible to deploy a model with training it previously in the same session, you need to instantiate a sagemaker.model.Model object and then deploy it.
I want to use a model that I have trained for inference on Google Cloud ML. It is a NLP model, and I want my node.js server to interact with the model to get predictions at train time.
I have a process for running inference on the model manually, that I would like to duplicate in the cloud:
Use Stanford Core NLP to tokenize my text and generate data files that store my tokenized text.
Have the model use those data files, create Tensorflow Examples out of it, and run the model.
Have the model print out the predictions.
Here is how I think I can replicate it in the Cloud:
Send the text to the cloud using my node.js server.
Run my python script to generate the data file. It seems like I will have to do this inside of a custom prediction routine. I'm not sure how I can use Stanford Core NLP here.
Save the data file in a bucket in Google Cloud.
In the custom prediction routine, load the saved data file and execute the model.
Can anyone tell me if this process is correct? Also, how can I run Stanford CoreNLP on Google Cloud custom prediction routine? Also, is there a way for me to just run command line scripts (for example for creating the data files I have a simple command that I normally just run to create them)?
You can implement a custom preprocessing method in Python and invoke the Stanford toolkit from there. See this blog and associated sample code for details: https://cloud.google.com/blog/products/ai-machine-learning/ai-in-depth-creating-preprocessing-model-serving-affinity-with-custom-online-prediction-on-ai-platform-serving