How to do OCR with Google's AutoML - google-cloud-platform

I want to do OCR and I know that Cloud Vision API supports it. But I'm interested in making my custom model for it and wish to use AutoML for the same. But I couldn't find anything related to OCR using AutoML. Is it possible to do OCR using AutoML? How do we go about this? I know this is a very open-ended question, but I'd appreciate some help.

AutoML Natural Language can perform OCR on PDFs; however, this is just a step because is intended for creating your on models on text classification, entity extraction or sentiment analysis.
If you goal is just to perform OCR the best approach will be Vision API.

You cannot do OCR from AutoML. Your options are to use the Cloud Vision API to do OCR and then apply your own algorithms to put the detected letters together in a certain way, or to start from scratch and train your own OCR model (not recommended).

Related

How does PaddleOCR performance compare to Google Cloud Vision OCR API?

Recently i find an OCR tool, which is called PaddleOCR. Has anyone used it, and how this OCR system preformance compare to Google Cloud Vision API?
I heard PaddleOCR called itself an industry-level open-sourced OCR engine, so I test a few images between it and Google Cloud Vision.
Generally speaking, commercial APIs like Google Cloud and Azure suppose to work better than the open-sourced OCR engine, it does, but for some scenarios, it's not too far away.
If the text is clear and flat, both work great. The main difference is the result format. Google API gives you rich content including block, paragraph, and word location information. PaddleOCR only returns the result according to the text line (transcriptions and locations).
If your test images are more complicated, like curved text, handwriting, or blurry. Commercial APIs probably work great than the open-sourced engine. However, when it can not meet your needs, try to use PaddleOCR training a new model.
Here is some visualization images:
PaddleOCR:
test1
test2
Google Cloud Vision API:
test1
test2

Does google cloud vision API output a confidence level when using character recognition?

I am doing OCR using the API of Google Cloud vision.
To make it easier to check the results, I'd like to visualize where we should be more careful and where we should be better off, depending on how reliable the API output is.
I couldn't find it as far as I could, but does the API have the ability to output the confidence level? It would be very much appreciated if you could tell us.

Google AutoML Video Intelligence Tools?

I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.

Google Cloud AutoML Natural Language for Chatbot like application

I want to develop a chatbot like application which gives response to input questions using Google Cloud Platform.
Naturally, Dialogflow is suited for this such applications. But due to business conditions, I cannot use Dialogflow.
An alternative could be AutoML Natural Language, where I do not need much machine learning expertise.
AutoML Natural Language requires documents which are labelled. These documents can be used for training a model.
My example document:
What is cost of Swiss tour?
Estimate of Switzerland tour?
I would use a label such as Switzerland_Cost for this document.
Now, in my application I would have a mapping between Labels and Responses.
During Prediction, when I give an input question to the trained model, I would get a predicted label. I can then use this label to return the mapped response.
Is there a better approach to my scenario?
I'm from Automl team. This seems like a good approach to me. People use Automl NL for intent detection, which is pretty aligned with what you try to do here.

aws sagemaker for detecting text in an image

I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.
I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.
The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.
Here are a few options:
Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.
I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!