How to do Inference with Custom Preprocessing and Data Files on Google Cloud ML - google-cloud-ml

I want to use a model that I have trained for inference on Google Cloud ML. It is a NLP model, and I want my node.js server to interact with the model to get predictions at train time.
I have a process for running inference on the model manually, that I would like to duplicate in the cloud:
Use Stanford Core NLP to tokenize my text and generate data files that store my tokenized text.
Have the model use those data files, create Tensorflow Examples out of it, and run the model.
Have the model print out the predictions.
Here is how I think I can replicate it in the Cloud:
Send the text to the cloud using my node.js server.
Run my python script to generate the data file. It seems like I will have to do this inside of a custom prediction routine. I'm not sure how I can use Stanford Core NLP here.
Save the data file in a bucket in Google Cloud.
In the custom prediction routine, load the saved data file and execute the model.
Can anyone tell me if this process is correct? Also, how can I run Stanford CoreNLP on Google Cloud custom prediction routine? Also, is there a way for me to just run command line scripts (for example for creating the data files I have a simple command that I normally just run to create them)?

You can implement a custom preprocessing method in Python and invoke the Stanford toolkit from there. See this blog and associated sample code for details: https://cloud.google.com/blog/products/ai-machine-learning/ai-in-depth-creating-preprocessing-model-serving-affinity-with-custom-online-prediction-on-ai-platform-serving

Related

Need help in deploying custom model in AWS SageMaker

We're trying to deploy a custom optimizer model into SageMaker. Our model consists of a number of .py files distributed across the repo and some external lib dependencies like ortools. Input CSV files can be put into a S3 bucket. Output of our model is a pickle file which is based on Input CSV files (these will be different each time someone runs a job).
We would prefer not to use ECR but if there's no other way option then can we follow the link below in order to achieve what we're aiming for? This sagemaker endpoint is expected to be called from a stepfunction.
https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html
I'd encourage you to check out the examples here for BYOC deployment.
Would require more information particularly on the framework and model to suggest further.

Custom Model for Batch Prediction on Vertex.ai

I want to run batch predictions inside Google Cloud's vertex.ai using a custom trained model. I was able to find documentation to get online prediction working with a custom built docker image by setting up an endpoint, but I can't seem to find any documentation on what the Dockerfile should be for batch prediction. Specifically how does my custom code get fed the input and where does it put the output?
The documentation I've found is here, it certainly looks possible to use a custom model and when I tried it didn't complain, but eventually it did throw an error. According to the documentation no endpoint is required for running batch jobs.

How can we add complex preprocessing in AWS Sagemaker inference

I am using AWS Sagemaker to deploy my speech models trained outside of Sagemaker. I am able to convert my model into something Sagemaker would understand and have deployed it as an endpoint. Problem is that Sagemaker directly loads the model and calls .predict to get the inference. I am unable to figure out where can I add my preprocessing functions in the deployed model. It is suggested to use AWS Lambda or another server for preprocessing. Is there any way I can incorporate complex preprocessing (cannot be done by simple Scikit, Pandas like framework) in Sagemaker itself?
You will want to adjust the predictor.py file in the container that you are bringing your speech models in. Assuming you are using Bring Your Container to deploy these models on SageMaker you will want to adjust the predictor code to include the preprocessing functionality that you are working with. For any extra dependencies that you are working with make sure to update this in your Dockerfile that you are bringing. Having the preprocessing functionality within the predictor file will make sure your data is transformed, processed as you desire before returning predictions. This will add to the response time however, so if you have heavy preprocessing workloads or ETL that needs to occur you may want to look into a service as AWS Glue (ETL) or Kinesis (real-time data streaming/data transformation). If you choose to use Lambda you want to keep in mind the 15 minute timeout limit.
I work for AWS & my opinions are my own

Data Prep preprocess to Ml Engine

Say I have code on App Engine reading Gmail attachments, parsing that it goes to Cloud Data Store, through Data Prep recipes and steps, stored back into Data Store, then predicted on by ML Engine Tensorflow model?
Reference:
Is this all achievable through Dataflow?
EDIT 1:
Is it possible to export the Data Prep steps and use them as preprocessing before an Ml Engine Tensorflow model?
The input for a Cloud ML Engine model can be defined how you better see fit for your project. This means you can apply the preprocessing steps the way you consider fit and then send your data to the Tensorflow model.
Be sure that the format you use in your Dataprep steps is supported by the Tensorflow model. Once you apply your Dataprep recipe with all the required steps, make sure that you use an appropriate format, such as CSV. It is recommended that you store your input to a Cloud Storage bucket for better access.
I don't know how familiar you are with Cloud Dataprep, but you can try this to check how to handle all the steps that you want to include in your recipe.

Google cloud ML without trainer

Can we train a model by just giving data and related column names without creating trainer in Google Cloud ML either using Rest API or command line interface
Yes. You can use Google Cloud Datalab, which comes with a structured data solution. It has an easier interface and takes care of the trainer. You can view the notebooks without setting up Datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
Once you set up Datalab, you can run the notebook. To set up Datalab, check https://cloud.google.com/datalab/docs/quickstarts.
Instead of building a model and calling CloudML service directly, you can try Datalab's ml toolbox which supports structured data and image classification. The ml toolbox takes your data, and automatically builds and trains a model. You just have to describe your data and what you want to do.
You can view the notebooks first without setting up datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
To set up Datalab and actually run these notebooks, see https://cloud.google.com/datalab/docs/quickstarts.