Custom code containers for google cloud-ml for inference - google-cloud-platform

I am aware that it is possible to deploy custom containers for training jobs on google cloud and I have been able to get the same running using command.
gcloud ai-platform jobs submit training infer name --region some_region --master-image-uri=path/to/docker/image --config config.yaml
The training job was completed successfully and the model was successfully obtained, Now I want to use this model for inference, but the issue is a part of my code has system level dependencies, so I have to make some modification into the architecture in order to get it running all the time. This was the reason to have a custom container for the training job in the first place.
The documentation is only available for the training part and the inference part, (if possible) with custom containers has not been explored to the best of my knowledge.
The training part documentation is available on this link
My question is, is it possible to deploy custom containers for inference purposes on google cloud-ml?

This response refers to using Vertex AI Prediction, the newest platform for ML on GCP.
Suppose you wrote the model artifacts out to cloud storage from your training job.
The next step is to create the custom container and push to a registry, by following something like what is described here:
https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements
This section describes how you pass the model artifact directory to the custom container to be used for interence:
https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#artifacts
You will also need to create an endpoint in order to deploy the model:
https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-api#aiplatform_deploy_model_custom_trained_model_sample-gcloud
Finally, you would use gcloud ai endpoints deploy-model ... to deploy the model to the endpoint:
https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model

Related

Running custom scripts in a custom container while running a sagemaker training job

I am currently in the process of setting up a custom sagemaker container to run training jobs on Sagemaker and have succeeded in doing so. However, I am a bit confused over this question which is currently bugging me and is definitely something that I need to consider in the future
Is it possible to run custom scripts on a custom container when declaring a sagemaker training job?
My current understanding when it comes to creating a custom sagemaker image is that I create a train file that gets executed when running a training job, but I could never find documentation on whether is it possible to overwrite this and run a training script (but using the same custom container), like how we run training jobs using in-built algorithms. Is it the case that for custom algorithms we are restricted by this limitation?
I don't fully understand your question (can you make an example please?), specially when you say
like how we run training jobs using in-built algorithms
But basically you can do whatever you want in your container, as probably you already did, you have a proper train file in your container, which is the one that sagemaker calls as the entrypoint. In that file you can call external script (which are in your container too) and also pass parameters to your container (see how for example hyperparameters are passed). There is a quite clear documentation here.

AWS Sagemaker - Custom Training Job not saving Model output

I'm running a training job using AWS SageMaker and i'm using a custom Estimator based on an available docker image from AWS. I wanted to get some feedback on whether my process is correct or not prior to deployment.
I'm running the training job in a docker container using 'local' in a SageMaker notebook instance and the training job runs successfully. However, after the job completes and saves the model to opt/model/models within the docker image, once the docker container exits, the model saved from training is lost. Ideally, i'd like to use the model for inference, however, I'm not sure about the best way of doing it. I have also tried the training job after pushing the image to ECR, but the same thing happens.
It is my understanding that the docker state is lost, once the image exits, as such, is it possible to persist the model that was produced in training in the image? One option I have thought about is saving the model output to an S3 bucket once the training job is complete, then pulling that model into another docker image for inference. Is this expected behaviour and the correct way of doing it?
I am fairly new to using SageMaker but i'd like to do it according to best practices. I've looked at a lot of the AWS documents and followed the tutorials but it doesn't seem to mention explicitly if this is how it should be done.
Thanks for any feedback on this.
You can refer to Rok's comment on saving a model file when you're using a custom estimator. That said, SageMaker built-in estimators save the model artifacts to S3. To make inferences using that model, you can either use a real-time inference endpoint for real time predictions, or a batch transformer to run inferences in batch mode. In both cases, you'll have to point the configuration to the container for inference and the model artifacts. the amazon-sagemaker-examples repository has examples for common frameworks, especially, the scikit-learn example has detailed explanations.
Also, make sure the model is being saved to /opt/ml/model/, not opt/model/models as mentioned in your question.

How to deploy a custom model in AWS SageMaker?

I have a custom machine learning predictive model. I also have a user defined Estimator class that uses Optuna for hyperparameter tuning. I need to deploy this model to SageMaker so as to invoke it from a lambda function.
I'm facing trouble in the process of creating a container for the model and the Estimator.
I am aware that SageMaker has a scikit learn container which can be used for Optuna, but how would I leverage this to include the functions from my own Estimator class? Also, the model is one of the parameters passed to this Estimator class so how do I define it as a separate training job in order to make it an Endpoint?
This is how the Estimator class and the model are invoked:
sirf_estimator = Estimator(
SIRF, ncov_df, population_dict[countryname],
name=countryname, places=[(countryname, None)],
start_date=critical_country_start
)
sirf_dict = sirf_estimator.run()
where:
Model Name : SIRF
Cleaned Dataset : ncov_df
Would be really helpful if anyone could look into this, thanks a ton!
The SageMaker inference endpoints currently rely on an interface based on Docker images. At the base level, you can set up a Docker image that runs a web server and responds to the endpoints on the ports that AWS require. This guide will show you how to do it: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html.
This is an annoying amount of work. If you're using a well-known framework they have a container library that contains some boilerplate code you might be able to reuse: https://github.com/aws/sagemaker-containers. You might be able to reuse some code from there, but customize it.
Or don't use SageMaker inference endpoints at all :) If your model can fit within the size / memory restrictions of AWS Lambda, that is an easier option!

how to run a pre-trained model in AWS sagemaker?

I have a model.pkl file which is pre-trained and all other files related to the ml model. I want it to deploy it on the aws sagemaker.
But without training, how to deploy it to the aws sagmekaer, as fit() method in aws sagemaker run the train command and push the model.tar.gz to the s3 location and when deploy method is used it uses the same s3 location to deploy the model, we don't manual create the same location in s3 as it is created by the aws model and name it given by using some timestamp. How to put out our own personalized model.tar.gz file in the s3 location and call the deploy() function by using the same s3 location.
All you need is:
to have your model in an arbitrary S3 location in a model.tar.gz archive
to have an inference script in a SageMaker-compatible docker image that is able to read your model.pkl, serve it and handle inferences.
to create an endpoint associating your artifact to your inference code
When you ask for an endpoint deployment, SageMaker will take care of downloading your model.tar.gz and uncompressing to the appropriate location in the docker image of the server, which is /opt/ml/model
Depending on the framework you use, you may use either a pre-existing docker image (available for Scikit-learn, TensorFlow, PyTorch, MXNet) or you may need to create your own.
Regarding custom image creation, see here the specification and here two examples of custom containers for R and sklearn (the sklearn one is less relevant now that there is a pre-built docker image along with a sagemaker sklearn SDK)
Regarding leveraging existing containers for Sklearn, PyTorch, MXNet, TF, check this example: Random Forest in SageMaker Sklearn container. In this example, nothing prevents you from deploying a model that was trained elsewhere. Note that with a train/deploy environment mismatch you may run in errors due to some software version difference though.
Regarding your following experience:
when deploy method is used it uses the same s3 location to deploy the
model, we don't manual create the same location in s3 as it is created
by the aws model and name it given by using some timestamp
I agree that sometimes the demos that use the SageMaker Python SDK (one of the many available SDKs for SageMaker) may be misleading, in the sense that they often leverage the fact that an Estimator that has just been trained can be deployed (Estimator.deploy(..)) in the same session, without having to instantiate the intermediary model concept that maps inference code to model artifact. This design is presumably done on behalf of code compacity, but in real life, training and deployment of a given model may well be done from different scripts running in different systems. It's perfectly possible to deploy a model with training it previously in the same session, you need to instantiate a sagemaker.model.Model object and then deploy it.

Google cloud ML without trainer

Can we train a model by just giving data and related column names without creating trainer in Google Cloud ML either using Rest API or command line interface
Yes. You can use Google Cloud Datalab, which comes with a structured data solution. It has an easier interface and takes care of the trainer. You can view the notebooks without setting up Datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
Once you set up Datalab, you can run the notebook. To set up Datalab, check https://cloud.google.com/datalab/docs/quickstarts.
Instead of building a model and calling CloudML service directly, you can try Datalab's ml toolbox which supports structured data and image classification. The ml toolbox takes your data, and automatically builds and trains a model. You just have to describe your data and what you want to do.
You can view the notebooks first without setting up datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
To set up Datalab and actually run these notebooks, see https://cloud.google.com/datalab/docs/quickstarts.