VertexAI Batch Inference Failing for Custom Container Model - google-cloud-platform

I'm having trouble executing VertexAI's batch inference, despite endpoint deployment and inference working perfectly. My TensorFlow model has been trained in a custom Docker container with the following arguments:
aiplatform.CustomContainerTrainingJob(
display_name=display_name,
command=["python3", "train.py"],
container_uri=container_uri,
model_serving_container_image_uri=container_uri,
model_serving_container_environment_variables=env_vars,
model_serving_container_predict_route='/predict',
model_serving_container_health_route='/health',
model_serving_container_command=[
"gunicorn",
"src.inference:app",
"--bind",
"0.0.0.0:5000",
"-k",
"uvicorn.workers.UvicornWorker",
"-t",
"6000",
],
model_serving_container_ports=[5000],
)
I have a Flask endpoint defined for predict and health essentially defined below:
#app.get(f"/health")
def health_check_batch():
return 200
#app.post(f"/predict")
def predict_batch(request_body: dict):
pred_df = pd.DataFrame(request_body['instances'],
columns = request_body['parameters']['columns'])
# do some model inference things
return {"predictions": predictions.tolist()}
As described, when training a model and deploying to an endpoint, I can successfully hit the API with JSON schema like:
{"instances":[[1,2], [1,3]], "parameters":{"columns":["first", "second"]}}
This also works when using the endpoint Python SDK and feeding in instances/parameters as functional arguments.
However, I've tried performing batch inference with a CSV file and a JSONL file, and every time it fails with an Error Code 3. I can't find logs on why it failed in Logs Explorer either. I've read through all the documentation I could find and have seen other's successfully invoke batch inference, but haven't been able to find a guide. Does anyone have recommendations on batch file structure or the structure of my APIs? Thank you!

Related

Try deploying a custom ML model with an endpoint by using a custom image on Google Cloud's Vertex AI

I have been banging my head around this for a while and Google Cloud does not have a lot of documentation about this issue. What I am trying to do is deploy a custom ML model on Google Cloud Vertex by:
Uploading the model onto the Model Registry in Vertex AI
Create an endpoint
Deploying the uploaded model on the created endpoint.
Steps 1 and 2 are easy to implement, and I am not facing and issues. However step 3 is always failing for some reason. Even the logs don't give me a lot of information.
For Step 1:
This is Dockerfile I am using to create a custom image to serve my ML model:
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
COPY requirements-base.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
COPY serve.py serve.py
COPY model.pkl model.pkl
And this is what my serve.py file looks like:
from fastapi import Request, FastAPI, Response
import json
import catboost
import pickle
import os
app = FastAPI(title="Sentiment Analysis")
AIP_HEALTH_ROUTE = os.environ.get('AIP_HEALTH_ROUTE', '/health')
AIP_PREDICT_ROUTE = os.environ.get('AIP_PREDICT_ROUTE', '/predict')
#app.get(AIP_HEALTH_ROUTE, status_code=200)
async def health():
return {'health': 'ok'}
#app.post(AIP_PREDICT_ROUTE)
async def predict(request: Request):
with open('model.pkl', 'rb') as file:
model = pickle.load(file)
data = request.get_json()
input_data = data['input']
predictions = model.predict(input_data)
return json.dumps({'predictions': predictions.tolist()})
if __name__ == '__main__':
app.run(debug = True, host="0.0.0.0", port=8080)
After building the image, I push it to artifact registry on Google Cloud.
Is there an issue with how I have written the serve.py file or Dockerfile?
Or is there an easier way to deploy custom ML models on Google Cloud for MLOps and prediction purposes.
Well I tried a couple of manual approaches from the Google Cloud Vertex AI and also using gcloud commands.
In the manual process, after importing the model with the custom image I clicked on deploy to an end point. But this seems to always fail and takes forever.
Similarly using gcloud, I first create endpoint, then upload my model on to the registry, and the upload the model on the endpoint created. But this approach also fails.
At the end of the day I want my model to be successfully deployed on the endpoint and should give the right answers for predictions. Or, somehow host my custom ML model on Google Cloud and make predictions with it in a reasonable and manageable way!

Error when invoking pre-trained model: NotFittedError("Vocabulary not fitted or provided")

I'm new to AWS SageMaker and I'm trying to deploy a simple pre-trained model to SageMaker to create endpoint and then make predictions.
The model is a sklearn linear regression model, the input is a vectorized sparse matrix, derived from a string of text (customer's review), and output the star-rating value (1 to 5).
I have trained the model locally and export its artifact to a model.joblib file.
Then I configure the inference.py file to zip it together with the model.joblib file into a model.tar.gz file, which is then uploaded to S3 for model registration and endpoint creation.
However, when I invoke the endpoint on a sample text, the following error is returned in the CloudWatch log:
File "/miniconda3/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 498, in _check_vocabulary
raise NotFittedError("Vocabulary not fitted or provided")
I understand that this means SageMaker is complaining about the trained model artifact being not fitted, and there is no problem with other parts (such as the inference.py file). However the pre-trained model was fitted before exporting.
I'm not sure which part was wrong, so I didn't upload any more codes not to cluster.
Thank you.

sagemaker Monitoring 'describe processing job' Error

I am following the steps in the SageMaker Monitoring Tutorial here: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_model_monitor/introduction/SageMaker-ModelMonitoring.html
And I get the following error for sage.describe_processing job():
I am not using any directory as /opt/ml... in my code. What is the /opt/ml... directory mentioned in the error? And how can I fix that error?
To be answer your question, please share the code you are using( link seems to be broken). I'm sharing code on running and creating a processing job to setup model monitoring for reference. Using this code you can run the processing job manually( without having to wait for the model monitoring job to run on an hourly basis) as long as your adhere to the API contracts
Reference code
from sagemaker.processing import Processor
processor = Processor(
base_job_name='nlp-data-drift-bert-v1',
role=role,
image_uri=processing_repository_uri,
instance_count=1,
instance_type='ml.m5.large',
env={ 'THRESHOLD':'0.5','bucket': bucket },
)
processor.run(
[ProcessingInput(
input_name='endpointdata',
source=f's3://{sagemaker_session.default_bucket()}/{s3_prefix}/endpoint/data_capture',
destination='/opt/ml/processing/input/endpoint',
)],
[ProcessingOutput(
output_name='result',
source='/opt/ml/processing/resultdata',
destination=destination,
)],
)

How to publish an exported Google AutoML machine as an API?

I have built a machine learning model using Google's AutoML Tables interface. Once the model was trained, I exported it in a docker container to my local machine by following the steps detailed on Google's official documentation page: https://cloud.google.com/automl-tables/docs/model-export. Now, on my machine, it exists inside a Docker container, and I am able to run it successfully using the following command:
docker run -v exported_model:/models/default/0000001 -p 8080:8080 -it gcr.io/cloud-automl-tables-public/model_server
Once running as a local host via Docker, I am able to make predictions using the following python code:
import requests
import json
vector = [1, 1, 1, 1, 1, 2, 1]
input = {"instances": [{"column_1": vector[0],
"column_2": vector[1],
"column_3": vector[2],
"column_4": vector[3],
"column_5": vector[4],
"column_6": vector[5],
"column_7": vector[6]}]}
jsonData = json.dumps(input)
response = requests.post("http://localhost:8080/predict", jsonData)
print(response.json())
I need to publish this model as an API to be used by my client. I have considered AWS EC2 and Azure functions, however, I have not had any success so far. Ideally, I plan to use the FastAPI interface, but do not know how to do this in a dockerized context.
I have since solved the problem, however, it comes at a cost. It is possible to deploy an AutoML model from GCP, but it will incur a small charge over time. I don't think there is a way around this, otherwise Google would loose revenue.
Once the model is up and running, the following Python code can be used to make predictions:
from google.cloud import automl_v1beta1 as automl
project_id = 'chatbot-286a1'
compute_region = 'us-central1'
model_display_name = 'DNALC_4p_global_20201112093636'
inputs = {'sessionDuration': 60.0, 'createdStartDifference': 1206116.042162, 'confirmedStartDifference': 1206116.20255, 'createdConfirmedDifference': -0.160388}
client = automl.TablesClient(project=project_id, region=compute_region)
feature_importance = False
if feature_importance:
response = client.predict(
model_display_name=model_display_name,
inputs=inputs,
feature_importance=True,
)
else:
response = client.predict(
model_display_name=model_display_name, inputs=inputs
)
print("Prediction results:")
for result in response.payload:
print(
"Predicted class name: {}".format(result.tables.value)
)
print("Predicted class score: {}".format(result.tables.score))
break
In order for the code to work, the following resources may be helpful:
Installing AutoML in python:
How to install google.cloud automl_v1beta1 for python using anaconda?
Authenticating AutoML in python:
https://cloud.google.com/docs/authentication/production
(Remember to put the json file path to the authentication token as an environment variable - this is for security purposes.)

How to deploy our own TensorFlow Object Detection Model in amazon Sagemaker?

I have my own trained TF Object Detection model. If I try to deploy/implement the same model in AWS Sagemaker. It was not working.
I have tried TensorFlowModel() in Sagemaker. But there is an argument called entrypoint- how to create that .py file for prediction?
entrypoint is a argument which contains the file name inference.py,which means,once you create a endpoint and try to predict the image using the invoke endpoint api. the instance will be created based on you mentioned and it will go to the inference.py script and execute the process.
Link : Documentation for tensor-flow model deployment in amazon sage-maker
.
The inference script must contain a methods input_handler and output_handler or handler which will cover both the function in inference.py script, this for pre and post processing of your image.
Example for Deploying the tensor flow model
In the above link, i have mentioned a medium post, this will be helpful for your doubts.