Data Prep preprocess to Ml Engine - google-cloud-platform

Say I have code on App Engine reading Gmail attachments, parsing that it goes to Cloud Data Store, through Data Prep recipes and steps, stored back into Data Store, then predicted on by ML Engine Tensorflow model?
Reference:
Is this all achievable through Dataflow?
EDIT 1:
Is it possible to export the Data Prep steps and use them as preprocessing before an Ml Engine Tensorflow model?

The input for a Cloud ML Engine model can be defined how you better see fit for your project. This means you can apply the preprocessing steps the way you consider fit and then send your data to the Tensorflow model.
Be sure that the format you use in your Dataprep steps is supported by the Tensorflow model. Once you apply your Dataprep recipe with all the required steps, make sure that you use an appropriate format, such as CSV. It is recommended that you store your input to a Cloud Storage bucket for better access.
I don't know how familiar you are with Cloud Dataprep, but you can try this to check how to handle all the steps that you want to include in your recipe.

Related

How can we add complex preprocessing in AWS Sagemaker inference

I am using AWS Sagemaker to deploy my speech models trained outside of Sagemaker. I am able to convert my model into something Sagemaker would understand and have deployed it as an endpoint. Problem is that Sagemaker directly loads the model and calls .predict to get the inference. I am unable to figure out where can I add my preprocessing functions in the deployed model. It is suggested to use AWS Lambda or another server for preprocessing. Is there any way I can incorporate complex preprocessing (cannot be done by simple Scikit, Pandas like framework) in Sagemaker itself?
You will want to adjust the predictor.py file in the container that you are bringing your speech models in. Assuming you are using Bring Your Container to deploy these models on SageMaker you will want to adjust the predictor code to include the preprocessing functionality that you are working with. For any extra dependencies that you are working with make sure to update this in your Dockerfile that you are bringing. Having the preprocessing functionality within the predictor file will make sure your data is transformed, processed as you desire before returning predictions. This will add to the response time however, so if you have heavy preprocessing workloads or ETL that needs to occur you may want to look into a service as AWS Glue (ETL) or Kinesis (real-time data streaming/data transformation). If you choose to use Lambda you want to keep in mind the 15 minute timeout limit.
I work for AWS & my opinions are my own

How to do Inference with Custom Preprocessing and Data Files on Google Cloud ML

I want to use a model that I have trained for inference on Google Cloud ML. It is a NLP model, and I want my node.js server to interact with the model to get predictions at train time.
I have a process for running inference on the model manually, that I would like to duplicate in the cloud:
Use Stanford Core NLP to tokenize my text and generate data files that store my tokenized text.
Have the model use those data files, create Tensorflow Examples out of it, and run the model.
Have the model print out the predictions.
Here is how I think I can replicate it in the Cloud:
Send the text to the cloud using my node.js server.
Run my python script to generate the data file. It seems like I will have to do this inside of a custom prediction routine. I'm not sure how I can use Stanford Core NLP here.
Save the data file in a bucket in Google Cloud.
In the custom prediction routine, load the saved data file and execute the model.
Can anyone tell me if this process is correct? Also, how can I run Stanford CoreNLP on Google Cloud custom prediction routine? Also, is there a way for me to just run command line scripts (for example for creating the data files I have a simple command that I normally just run to create them)?
You can implement a custom preprocessing method in Python and invoke the Stanford toolkit from there. See this blog and associated sample code for details: https://cloud.google.com/blog/products/ai-machine-learning/ai-in-depth-creating-preprocessing-model-serving-affinity-with-custom-online-prediction-on-ai-platform-serving

Exporting a model to be implemented in mobile app

We tested Cloud AutoML Vision product, the results are amazing 96% accuracy.
So what we did so far was: upload labeled dataset, train, evaluate so we have a MODEL.
Further we want to Export this model and implement on a iOS app.
But how do we export from Cloud AutoML?
What formats are supported?
(did we missed something? we want in the end to get a .mlmodel file, we can use a converter but first we need to export some format).
Model export feature is currently not supported in Cloud AutoML Vision.
The team is aware of this feature request. You can star and keep an eye on: https://issuetracker.google.com/113122585 for updates.
The export functionality has been since added and is documented here: https://cloud.google.com/vision/automl/docs/deploy
It seems the easiest way to do so is in the UI.
You can export an image classification model in either generic
Tensorflow Lite format, Edge TPU compiled TensorFlow Lite format, or
TensorFlow format

Google App Engine Parse Logs in DataStore Save to Table

I am new to GAE and I am trying to quickly find a way to retrieve logs in DataStore, clean them to my specs, and then save them to a table to be called on later for a reports view in my app. I was thinking of using Google Data Flow and creating batch jobs (app is python/Django) but the documentation does not seem to fit my use case so maybe data flow is not the answer. I could create a python script with BigQuery and schedule through CRON but then I would have to contend with errors and it would seem that there is a faster way to solve this problem.
Any help/thoughts/suggestions is always greatly appreciated.
You can use Dataflow/Beam Python SDK to develop a pipeline that read entities from Datastore [1], transform data, and write a table to BigQuery [2]. To schedule this job to run regularly you'll have to use a third party mechanism such as a cron job. Note that Dataflow performs automatic scaling and perform retries to handle errors so you are not expected to manually address these complexities.
[1] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/datastore/v1/datastoreio.py
[2] https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py

Google cloud ML without trainer

Can we train a model by just giving data and related column names without creating trainer in Google Cloud ML either using Rest API or command line interface
Yes. You can use Google Cloud Datalab, which comes with a structured data solution. It has an easier interface and takes care of the trainer. You can view the notebooks without setting up Datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
Once you set up Datalab, you can run the notebook. To set up Datalab, check https://cloud.google.com/datalab/docs/quickstarts.
Instead of building a model and calling CloudML service directly, you can try Datalab's ml toolbox which supports structured data and image classification. The ml toolbox takes your data, and automatically builds and trains a model. You just have to describe your data and what you want to do.
You can view the notebooks first without setting up datalab:
https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox
To set up Datalab and actually run these notebooks, see https://cloud.google.com/datalab/docs/quickstarts.