SpaCy-based custom prediction on Google AI Platform - google-cloud-platform

I'm trying to run a custom prediction routine on Google's AI Platform, but always get an error when I include spaCy as a required package in my setup.py:
gcloud beta ai-platform versions create v1 --model MODEL_NAME --python-version=3.7 --runtime-version=1.15 --package-uris=gs://PATH_TO_PACKAGE --machine-type=mls1-c4-m2 --origin=gs://PATH_TO_MODEL --prediction-class=basic_predictor.BasicPredictor
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: "There was a problem processing the user code: basic_predictor.BasicPredictor cannot be found. Please make sure (1) prediction_class is the fully qualified function name, and (2) it uses the correct package name as provided by the package_uris: ['gs://PATH_TO_PACKAGE'] (Error code: 4)"
As soon as I remove spaCy as a dependency, the AI Platform is able to create the version, so it looks like incorrect function names or package names cannot be the problem. Obviously, my model relies on spaCy, so leaving it out is not an option.
Does anyone know how to fix this?

This seems to be an issue on how the dependencies are being installed on AI Platform prediction nodes. I replicated the issue and got the same error, I also tried to package the library as a tar.gz file but it failed in the same way.
I went ahead and reported this issue in GCP IssueTracker so the AI Platform team can investigate it, you can subscribe to it, to receive notifications whenever there's an update.

Related

Dataproc custom image: Cannot complete creation

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

Tensorflow on Amazon SageMaker

I need to deploy a custom object detection model using tensorflow AWS API following this tutoriel : https://github.com/aws-samples/amazon-sagemaker-tensorflow-object-detection-api
I'm getting this error whenever I try to deploy using this code :
predictor = model_endpoint.deploy(initial_instance_count=1, instance_type='ml.m5.large')
The problem:
update_endpoint is a no-op in sagemaker>=2.
Can you help me to solve this please ?
Or can you tell me how to deploy a custom detection model on sagemaker ?
Can you try using model_endpoint.update_endpoint(...)? Alternatively, you can find examples here for deploying a Tensorflow model - https://github.com/RamVegiraju/SageMaker-Deployment/tree/master/RealTime/Script-Mode/TensorFlow.
According to the documentation:
The update_endpoint argument in deploy() methods for estimators and
models is now a no-op. Please use
sagemaker.predictor.Predictor.update_endpoint() instead.
However, I recently successfully deployed a TensorFlow 2.7.0 model with SageMaker 2.70.0, as far as I know this is a warning not a breaking-change error.
The errors that you have will have to do with other problems, not this one (bear in mind that it is a warning, not a breaking change, as of the time of this comment + versions of the dependencies).

Prediction on GCP with ML Engine

I am working on GCP to predict, I'm using the census dataset, actually I'm discovering google APIs ( ML Engine ...).
When I launch the prediction job , the job runs successfully, but it doesn't display the result.
Can anyone help ? Do you have any idea why it doesn't generate an output ?
Thanks in advance :)
This is the error that occurs
https://i.stack.imgur.com/9gyTb.png
This error is common when you train with one version of TF and then try serving with a lower version. For instance, if you are using Cloud console to deploy your model, it currently has no way of letting you select the version of TensorFlow for serving, so the model is deployed using TF 1.0, but your model may have been trained with a higher version of TF (current version is 1.7).
Although the Cloud console doesn't currently let you select the version (but it will soon!), using gcloud or the REST API directly does allow you to.
In the docs, there is a section on creating a model that has code snippets under "gcloud" and "python". With gcloud you simply add the argument --runtime-version=1.6 (or whatever version) and with python you add the property "runtimeVersion": "1.6" to the body of the request.

google cloud machine learning error

pleas help me
i cannot solve this
ERROR: (gcloud.beta.ml.models.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: The model directory gs://valued-aquifer-164405-ml/mnist_deployable_garu_20170413_150711/model/ is expected to contain exactly one of the following: the 'export.meta' file, or 'saved_model.pb' file or 'saved_model.pbtxt' file.Please make sure one of these files exists and you have read access to it.
I am new to Google Cloud. I have also got the same kind of issue. When I was trying to create version for model. I have resolved it.
you need to do two steps:
Export model --> it will give you saved_model.pbtxt, I am using tensorflow so I have used export_savedmodel()
Upload saved_model.pbtxt & variables folder to storage
and try
This command has since been updated to gcloud ml-engine versions create.
It is recommended to run gcloud components update to install the latest GCloud, then follow the new instructions for deploying your own models to Cloud ML Engine.
Note: If you experience issues with GCloud in the future, it is recommended to report the issue in a Public Issue Tracker.

Failed to run the inference graph - what could be wrong?

I am trying to deploy a locally trained model. I followed all of the instructions here for model preparation and I managed to deploy it.
However when I try to get the predictions, the online prediction responds with 502 Server error and the batch prediction returns ('Failed to run the inference graph', 1)
Is there a way to get a better error message to narrow down what's wrong?
Thanks
The error message indicated it occurred when running the session for the inference graph. It might be possible to uncover what is be happening with some code to use the model locally. One way to test it is to create a small input dataset and feed it to the inference graph to check if you can run the session locally.
You may refer the local_predict.py in the samples/mnist/deployable/ in SDK about how to do that. Here is an example use:
python local_predict.py --input=/path/to/my/local/files --model_dir=/path/to/modeldir.
Note that the model_dir points to where the tensorflow meta graph proto and checkpoint files are saved. They are generated by training. Here is the doc link about how to train a model. https://cloud.google.com/ml/docs/how-tos/training-models. The model dir can be on GCS as well.
Thanks for bringing this up. We're continually working to improve the overall experience of the service including error reporting.