I need to deploy a custom object detection model using tensorflow AWS API following this tutoriel : https://github.com/aws-samples/amazon-sagemaker-tensorflow-object-detection-api
I'm getting this error whenever I try to deploy using this code :
predictor = model_endpoint.deploy(initial_instance_count=1, instance_type='ml.m5.large')
The problem:
update_endpoint is a no-op in sagemaker>=2.
Can you help me to solve this please ?
Or can you tell me how to deploy a custom detection model on sagemaker ?
Can you try using model_endpoint.update_endpoint(...)? Alternatively, you can find examples here for deploying a Tensorflow model - https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/TensorFlow.
According to the documentation:
The update_endpoint argument in deploy() methods for estimators and
models is now a no-op. Please use
sagemaker.predictor.Predictor.update_endpoint() instead.
However, I recently successfully deployed a TensorFlow 2.7.0 model with SageMaker 2.70.0, as far as I know this is a warning not a breaking-change error.
The errors that you have will have to do with other problems, not this one (bear in mind that it is a warning, not a breaking change, as of the time of this comment + versions of the dependencies).
Related
I'm trying to run a custom prediction routine on Google's AI Platform, but always get an error when I include spaCy as a required package in my setup.py:
gcloud beta ai-platform versions create v1 --model MODEL_NAME --python-version=3.7 --runtime-version=1.15 --package-uris=gs://PATH_TO_PACKAGE --machine-type=mls1-c4-m2 --origin=gs://PATH_TO_MODEL --prediction-class=basic_predictor.BasicPredictor
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: "There was a problem processing the user code: basic_predictor.BasicPredictor cannot be found. Please make sure (1) prediction_class is the fully qualified function name, and (2) it uses the correct package name as provided by the package_uris: ['gs://PATH_TO_PACKAGE'] (Error code: 4)"
As soon as I remove spaCy as a dependency, the AI Platform is able to create the version, so it looks like incorrect function names or package names cannot be the problem. Obviously, my model relies on spaCy, so leaving it out is not an option.
Does anyone know how to fix this?
This seems to be an issue on how the dependencies are being installed on AI Platform prediction nodes. I replicated the issue and got the same error, I also tried to package the library as a tar.gz file but it failed in the same way.
I went ahead and reported this issue in GCP IssueTracker so the AI Platform team can investigate it, you can subscribe to it, to receive notifications whenever there's an update.
I am looking for a solution that allows me to host my trained Sklearn model (that I am satisfied with) on SageMaker without having to retrain it before deploying to an endpoint.
On the one hand I have seen specific examples for bring-your-own scikit model that involve containerizing the trained model but - these guides go through the training step and dont specifically show how you can alternatively avoid retraining the model and just deploy. (https://github.com/awslabs/amazon-sagemaker-examples/blob/27d3aeb9166a4d4dbbb0721d381329e41d431078/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb)
On the other hand, there are guides that show you how to BYOM only for deploying - but these are specific to MXNet and TensorFlow frameworks. I noticed that the way you export your model artifacts among frameworks differs. I need something specific to Sklearn and how to get to a good point where I have model artifacts in the correct format Sagemaker expects(https://github.com/awslabs/amazon-sagemaker-examples/tree/27d3aeb9166a4d4dbbb0721d381329e41d431078/advanced_functionality/mxnet_mnist_byom)
The closest guide I have seen that might work is this one: https://aws.amazon.com/blogs/machine-learning/bring-your-own-pre-trained-mxnet-or-tensorflow-models-into-amazon-sagemaker/
However, I dont know what my sklearn "model artifacts" includes. I think I need a clear understanding of what sklearn model artifacts looks like and what it includes.
Any help is appreciated. The goal is to avoid training in Sagemaker and only deploy my already trained scikit model to an endpoint.
I am working on GCP to predict, I'm using the census dataset, actually I'm discovering google APIs ( ML Engine ...).
When I launch the prediction job , the job runs successfully, but it doesn't display the result.
Can anyone help ? Do you have any idea why it doesn't generate an output ?
Thanks in advance :)
This is the error that occurs
https://i.stack.imgur.com/9gyTb.png
This error is common when you train with one version of TF and then try serving with a lower version. For instance, if you are using Cloud console to deploy your model, it currently has no way of letting you select the version of TensorFlow for serving, so the model is deployed using TF 1.0, but your model may have been trained with a higher version of TF (current version is 1.7).
Although the Cloud console doesn't currently let you select the version (but it will soon!), using gcloud or the REST API directly does allow you to.
In the docs, there is a section on creating a model that has code snippets under "gcloud" and "python". With gcloud you simply add the argument --runtime-version=1.6 (or whatever version) and with python you add the property "runtimeVersion": "1.6" to the body of the request.
pleas help me
i cannot solve this
ERROR: (gcloud.beta.ml.models.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: The model directory gs://valued-aquifer-164405-ml/mnist_deployable_garu_20170413_150711/model/ is expected to contain exactly one of the following: the 'export.meta' file, or 'saved_model.pb' file or 'saved_model.pbtxt' file.Please make sure one of these files exists and you have read access to it.
I am new to Google Cloud. I have also got the same kind of issue. When I was trying to create version for model. I have resolved it.
you need to do two steps:
Export model --> it will give you saved_model.pbtxt, I am using tensorflow so I have used export_savedmodel()
Upload saved_model.pbtxt & variables folder to storage
and try
This command has since been updated to gcloud ml-engine versions create.
It is recommended to run gcloud components update to install the latest GCloud, then follow the new instructions for deploying your own models to Cloud ML Engine.
Note: If you experience issues with GCloud in the future, it is recommended to report the issue in a Public Issue Tracker.
I am trying to deploy a locally trained model. I followed all of the instructions here for model preparation and I managed to deploy it.
However when I try to get the predictions, the online prediction responds with 502 Server error and the batch prediction returns ('Failed to run the inference graph', 1)
Is there a way to get a better error message to narrow down what's wrong?
Thanks
The error message indicated it occurred when running the session for the inference graph. It might be possible to uncover what is be happening with some code to use the model locally. One way to test it is to create a small input dataset and feed it to the inference graph to check if you can run the session locally.
You may refer the local_predict.py in the samples/mnist/deployable/ in SDK about how to do that. Here is an example use:
python local_predict.py --input=/path/to/my/local/files --model_dir=/path/to/modeldir.
Note that the model_dir points to where the tensorflow meta graph proto and checkpoint files are saved. They are generated by training. Here is the doc link about how to train a model. https://cloud.google.com/ml/docs/how-tos/training-models. The model dir can be on GCS as well.
Thanks for bringing this up. We're continually working to improve the overall experience of the service including error reporting.