I have a tensorflow 2.0 model I would like to deploy to an AWS sagemaker endpoint. I have moved the model to S3 bucket and executed the following code, but get below error because there is no TF 2.0 image. If I try to deploy with different version (e.g. 1.4, 1.8) I get ping time out errors.
Is it possible to create one easily? I can't find a good tutorial to follow. Or will Amazon deploy one at some point.
Failed. Reason: The image '520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:2.0-cpu-py2' does not exist..
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://sagemaker-eu-west-1-
273649867642/model/model.tar.gz',
role = role,
framework_version = '2.0',
entry_point = 'train.py')
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')
Also no images seem to support python 3 even though they suggest you define that when setting up the model.
"The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image"
SageMaker currently do not support TensorFlow 2.0 yet (neither py2 or py3). But it will be available with SageMaker soon.
As regard with Python versions. For the current supporting TensorFlow versions, py2 is supported, however, after Jan. 1, 2020, all future framework versions will not support py2 anymore.
Related
I need to deploy a custom object detection model using tensorflow AWS API following this tutoriel : https://github.com/aws-samples/amazon-sagemaker-tensorflow-object-detection-api
I'm getting this error whenever I try to deploy using this code :
predictor = model_endpoint.deploy(initial_instance_count=1, instance_type='ml.m5.large')
The problem:
update_endpoint is a no-op in sagemaker>=2.
Can you help me to solve this please ?
Or can you tell me how to deploy a custom detection model on sagemaker ?
Can you try using model_endpoint.update_endpoint(...)? Alternatively, you can find examples here for deploying a Tensorflow model - https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/TensorFlow.
According to the documentation:
The update_endpoint argument in deploy() methods for estimators and
models is now a no-op. Please use
sagemaker.predictor.Predictor.update_endpoint() instead.
However, I recently successfully deployed a TensorFlow 2.7.0 model with SageMaker 2.70.0, as far as I know this is a warning not a breaking-change error.
The errors that you have will have to do with other problems, not this one (bear in mind that it is a warning, not a breaking change, as of the time of this comment + versions of the dependencies).
I have already implemented a sagemaker pipeline model. In particular for an end-to-end notebook that trains a model, builds a pipeline model and deploys it, I have followed this sample notebook.
Now I would like to retrain and deploy the entire pipeline every day using Airflow, but I have seen here the possibility to retrain and deploy only a single sagemaker model.
Is there a way to retrain and deploy the entire pipeline? Thanks
SageMaker provides 2 options for users to do Airflow stuff:
Use the APIs in SageMaker Python SDK to generate input of all SageMaker operators in Airflow. The blog you linked goes this way. For example, they use API training_config in SageMaker Python SDK and operator SageMakerTrainingOperator in Airflow.
Use PythonOperator provided by Airflow and write Python codes to do what you want.
For 1, SageMaker only implemented APIs related to training, tuning, single model deployment and transform. Hence you are doing pipeline model, I don't think it has the API you want.
But for 2, if you can finish what you want in whatever Python codes with SageMaker. You should be able to adapt it as Python callables and make them work with PythonOperators. Here's an example for training in this way provided by SageMaker:
https://sagemaker.readthedocs.io/en/stable/using_workflow.html#using-airflow-python-operator
I think you can do similar things to make Airflow work with your pipeline model.
I am working on GCP to predict, I'm using the census dataset, actually I'm discovering google APIs ( ML Engine ...).
When I launch the prediction job , the job runs successfully, but it doesn't display the result.
Can anyone help ? Do you have any idea why it doesn't generate an output ?
Thanks in advance :)
This is the error that occurs
https://i.stack.imgur.com/9gyTb.png
This error is common when you train with one version of TF and then try serving with a lower version. For instance, if you are using Cloud console to deploy your model, it currently has no way of letting you select the version of TensorFlow for serving, so the model is deployed using TF 1.0, but your model may have been trained with a higher version of TF (current version is 1.7).
Although the Cloud console doesn't currently let you select the version (but it will soon!), using gcloud or the REST API directly does allow you to.
In the docs, there is a section on creating a model that has code snippets under "gcloud" and "python". With gcloud you simply add the argument --runtime-version=1.6 (or whatever version) and with python you add the property "runtimeVersion": "1.6" to the body of the request.
I am very new to SageMaker. Upon my first interaction, it looks like the AWS SageMaker requires you to start from its Notebook. I have a training set which is ready. Is there a way to bypass setting the Notebook and just to start by upload the training set? Or it should be done through the Notebook. If anyone knows some example fitting my need above, that will be great.
Amazon SageMaker is a combination of multiple services that each is independent of the others. You can use the notebook instances if you want to develop your models in the familiar Jupyter environment. But if just need to train a model, you can use the training jobs without opening a notebook instance.
There a few ways to launch a training job:
Use the high-level SDK for Python that is similar to the way that you start a training step in your python code
kmeans.fit(kmeans.record_set(train_set[0]))
Here is the link to the python library: https://github.com/aws/sagemaker-python-sdk
Use the low-level API to Create-Training-Job, and you can do that using various SDK (Java, Python, JavaScript, C#...) or the CLI.
sagemaker = boto3.client('sagemaker')
sagemaker.create_training_job(**create_training_params)
Here is a link to the documentation on these options: https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html
Use Spark interface to launch it using a similar interface to creating an MLLib training job
val estimator = new KMeansSageMakerEstimator(
sagemakerRole = IAMRole(roleArn),
trainingInstanceType = "ml.p2.xlarge",
trainingInstanceCount = 1,
endpointInstanceType = "ml.c4.xlarge",
endpointInitialInstanceCount = 1)
.setK(10).setFeatureDim(784)
val model = estimator.fit(trainingData)
Here is a link to the spark-sagemaker library: https://github.com/aws/sagemaker-spark
Create a training job in the Amazon SageMaker console using the wizard there: https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs
Please note that there a few options also to train models, either using the built-in algorithms such as K-Means, Linear Learner or XGBoost (see here for the complete list: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html). But you can also bring your own models for pre-baked Docker images such as TensorFlow (https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html) or MXNet (https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html), your own Docker image (https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html).
The docs for setting up Google Cloud ML suggest installing Tensorflow version r0.11. I've observed that TensorFlow functions newly available in r0.12 raise exceptions when run on Cloud ML. Is there a timeline for Cloud ML supporting r0.12? Will switching between r0.11 and r0.12 be optional or mandatory?
Yes, you can specify --runtime-version=0.12 to get a 0.12 build. This is a new feature and is documented at https://cloud.google.com/ml/docs/concepts/runtime-version-list
Note, however, that the 0.12 build is not yet considered stable and the exact Tensorflow build provided may change. Once the 1.0 version of Tensorflow is available, that will also be supported and the pre-1.0 versions of Tensorflow will begin to be deprecated.
(See https://cloud.google.com/sdk/gcloud/reference/beta/ml/jobs/submit/training for usage.)