Google Vertex AI image AutoML classification when an important image feature is text inside the image - google-cloud-platform

I'd like to do image classification. In my dataset, despite the fact that images features is a strong component for this classification (colors, shapes, etc), some categories of images will be hard to distinguish without interpreting the text inside the image.
I don't think VertexAI/AutoML will use pre-trained models in order to facilitate classification if in some case the only difference is the text. I know Google Vision/OCR is capable of doing such extraction. But is there a way to do image classification (VertexAI/AutoML) using Google Cloud Vision extraction as an additional image feature?
Currently my project uses 3 models (no google cloud):
model 1: classify an image using images features
model 2: classify an image, only using OCR + regex (same categories)
model 3: combine both models and decide when to use model 1 or model 2
I'd like to switch to Vertex AI the following will improve my project quality for the following:
AutoML classification seems very good for model 1
I need to use a tool to manage my datasets (Vertex AI managed dataset)
Vertex AI has interesting pipeline training features
If it is confirmed that AutoML won't perform well if some images categories only differs in the text, I would recreate a similar 3-tier models using Vertex AI custom training scripts. I can easily create model 1 with VertexAI/AutoML. However I have no idea if:
I can create model 2 with a vertex ai custom training script using google cloud vision/ocr to do image classification
I can create model 3 that would use models 1 and 2 created by vertex ai.
Could you give me recommendations on how to achieve that using Google Cloud Platform?

For this purpose, I recommend you the following:
1. model 2:
Keep your images in a GCS.
Use the Detect text in images  |  Cloud Vision API to generate your dataset (text) {"gcs":"gs://path_to_image/image_1","text":["text1"...]}.
Use AutoML on this text dataset processed by vision api or just use a regexp on this data or insert into a bigquery dataset and query on it, and so on...
1. model 3:
I would follow a similar approach, processing the images using the cloud vision API and generating a text dataset, but this time, the images that dont have any text on it, will generate a dataset with the "text" field empty {"gcs":"gs://path_to_image/image_2","text":[]}. Your own script can exclude the data with text and generate a dataset for the model 2, and a dataset for the model 1.
I see that your models 2 and 3 are not strictly classifications. Model 2 is a ocr problem, and them you process the output data. The model 3 is basically process your data and separate the proper datasets.
I hope this insight may help you.

Related

Possible to do incremental training with AWS comprehend?

I am looking at AWS Comprehend for a text classification task which will involve an active learning component. I am trying to understand if it's possible to incrementally train a custom comprehend model using batches of newly annotated data, or if it only supports training from scratch. In this blog post it sounds like they are stitching the annotated data back together with the original training data (i.e. retraining from screatch each time), but I don't see the mentioned cloudformation template (part 1 has the template for training/deployment, but part 2 seems to be talking about another template).
Is it possible to do incremental training with Comprehend? Or would I need to use a custom text classification model through SageMaker and then do incremental training that way? I am attempting to do the following
Get a pretrained model
Fine tune on own classification data
Incrementally train on annotated low confidence preditions
1 and 2 can be done with AWS Comprehend, but not sure about 3. Thanks

Import Labeled Data from vott to Google Cloud AutoML

I want to go ahead and create a classifier, I and I do not like the Google's Browser Labeling Service. Is there a tool similar to vott or some code, that I can use to go ahead and import my vott labeled data and import it Google AutoML.
The Google Labeling Service looks something like this and is very slow in loading images and inefficient it literally has a white labeling cursor and I have light background in my images
As seen in the Image Here.
On the Other Hand can I import it using vott which is much more better in every way. So is there a way for me to do this using vott to import the labeled csv into Google's Cloud AutoML.
I don't think that it is currently possible to import already labeled data from other apps (like VOTT).
At the moment there are 3 ways to label images in Cloud Vision. It's described in the Annotating imported training images
Provide bounding boxes with labels for your training images via labeled bounding boxes in your .csv import file
In the CSV file you would need to provide GCS url and label/labels
Labeled: gs://my-storage-bucket-vcm/flowers/images/img100.jpg,daisy
Multi-label: gs://my-storage-bucket-vcm/flowers/images/img384.jpg,dandelion,tulip,rose
Assigned to a set: TEST,gs://my-storage-bucket-vcm/flowers/images/img805.jpg,daisy
More details can be found here.
Provide unannotated images in your .csv import file and use the UI to provide image annotations
Not labeled: gs://my-storage-bucket-vcm/flowers/images/img403.jpg
However, later you will need to label it using UI, otherwise it will be ignored.
AutoML Vision ignores items without a category label.
Request manual image annotation with Google's Human Labeling service
This option would include human labelers and would need to provide additional information like dataset, label set and instructions for people.
In the documentation you can also find information that currently API is not supporting any method for labeling.
The AutoML API does not currently include methods for labeling.
However, you can propose Feature Request via IssueTracker to add some additional import methods from different apps or enable the use API.

Using google cloud for image classification, cropping and OCR

Please allow me to ask a rather newbie question. So far, I have been using local tools like imagemagick or GOCR to perform the job, but that is rather old-fashioned, and I am urged to "move to google cloud AI".
The setup
I have a (training) data set of various documents (as JPG and PDF) of different kinds, and by certain features (like prevailing color, repetitive layout) I intend to classify them, e.g. as invoice type 1, invoice type 2, not an invoice. In a 2nd step, I would like to OCR certain predefined areas of each document and extract e.g. the address of the company sending the invoice and the date.
The architecture I am envisioning
In a modern platform as a service (pass), I have already set up an UI where I can upload new files. These are then locally stored in a directory with filenames (or in a MongoDB). Meta info like upload timestamp, user, original file name is stored in a DB.
The newly uploaded file should should then be submitted to google cloud which should do the classification step, and deliver back the label to be saved in the database.
The document pages should be auto-cropped, i.e. black or white margins are removed, most probably with google cloud as well. The parameters of the crop should be persisted in the DB.
In case it is e.g. an invoice, OCR should be performed (again by google cloud) for certain regions of the documents, e.g. a bounding box of spanning from the mid of the page to the right margin in the upper 10% of the cropped page. The results of the OCR should be again persisted locally.
The problem
I seem to be missing the correct search term to figure out how to do it with google cloud. Is there an google-API (e.g. REST), I can use to upload and which gives me back the results of steps 2 to 4?
I think that your best option here is to use Document AI (REST API and Libraries).
Using Document AI, you can:
Convert images to text
Classify documents
Analyze and extract entities
Additionally, for your use case, we have a new Document AI feature that is still in preview and has limited access which is the Invoice parser.
Invoice parser is similar to Form parser but for invoices instead of forms. Check out the Invoice parser page and you will see what I mean by preview and limited access.
AFIK, there isn't any GCP tool for image edition.

Google Cloud Vision - Data Set cannot be split into train, validation, test data

I am trying to build an object detection model using Google Cloud Vision. The model should draw bounding boxes around rice
What I have done so far:
I have imported an image set of 15 images
I have used the Google Cloud tool to draw ~550 bounding boxes in 10 images
Where I am stuck:
I have built models before, and the data set was automatically split into train, validation and test set. This time however, Google Cloud is not splitting the data set.
What I have tried:
Downloading the .csv with the labeled data and reimporting it into Google Cloud
Adding more labels beyond the one label I have right now
Deleting and recreating the data set
How can I get Google Cloud to split the data set?
Your problem is that Google Cloud Platform determined your train, test and validation sets when you uploaded your images. Your test and validation images are likely your last 5 images which are not available for training if you have not labeled them yet. If you label all of your images or remove those from the dataset you should be able to train. See this SO answer for more info.
You can verify this by clicking the Export Data option and downloading a CSV of your dataset: you can see that data set categories are already defined, even for images that have not yet been labeled yet.

How to make Automl model with Healthcare Entity extraction as base model?

I am facing a problem while making a custom model with Automl. I am supplying Automl with JSONL training data with label DISEASE.
My service account has the permission healthcare.nlpservce.analyzeEntities and also before Start training i am choosing the option to Enable Healthcare Entity Extraction.
But still after 4+ hours of training the model detects only the DISEASE label.
It is not detecting Problems, Procedures, etc
I am following the steps mentioned in the documentation .
Attached photo of the service account's permission(no utilization analysis)
Can anyone please point me in the right direction....