Does AWS Comprehend classify images? - amazon-web-services

I am fairly new to AWS Comprehend. I know that AWS Comprehend can custom classify documents (Text Files). Does, AWS Comprehend also classify Image files? Also, while training the model, is it necessary to give the entire document text in the CSV or will just keywords do?
The reason being, I want to built a custom classifier that can classify invoice, Pay Stubs and few other such document types which are in image formats. Can Comprehend do this? If so how?
Googled quite a lot but couldn't find anything much relevant around. Really appreciate your help with this.
Thank you!

Comprehend doesn't do this natively, so you would have to build a solution. Something you could try is to combine Amazon Textract (for extracting the details from the documents) and then Comprehend to classify them.
From the FAQ, Textract calls out this as a common use case. I couldn't find an exact example of someone doing this, but it is directly called out in the documentation.

Amazon Comprehend only works on text.
Amazon Rekognition works on images.

AWS has all the building blocks to accomplish this, but you will have to configure/build this yourself. You can use AWS Textract to extract all the text from a document, and then pass the text into the AWS Comprehend service to do the classification for document type.
Before you can do this you need to train the machine learning part of Comprehend to do the correct identification of the document types. You need to configure and train a custom classifier in AWS Comprehend where you supply a CSV file with a list of classifications for example 'document type' and then text that would be in the type of document. If it is just forms then you can use Textract Form feature to only get key value pairs, then use the keys (labels in the form) as text for the custom classifier.

Related

AWS Rekognition vs AWS visual search

Currently, I need to implement searching products by image on my app. As doing research, I wanna go for aws rekognition. So when the model predicts the image, I can pass the predicted label to query products by my api. This is what I plan to do. However, I also come across aws visual search (using aws sageMaker) which is way beyond my understanding. So, am I on the right way to implement it by using the first option (aws rekognition ) ???
Amazon Rekognition is 'out-of-the-box' image recognition. It can label pictures, find faces, read text, etc. It accepts custom labels, however it is not possible to modify the general recognition process.
Amazon SageMaker is a machine learning platform for building your own models. It is highly flexible, for everything from image recognition through to predictive analytics. However, it is quite complex and is usually used by Data Scientists.
Given your knowledge levels, Amazon Rekognition would be a better choice for you.

How to customise AWS Textract?

So far my Textract tests are very impressive for handwriting, but I see sometimes it fails to recognise some forms and some values. Is it possible to train it? If I'm scanning the same type of form/document it will be very useful to amend the results and teaching it where the boundaries of some form elements lie and some key-value associations as well?
It will be a real deal breaker for the kind of service I'm trying to design.
Thanks in advance.
No. It is not possible to 'train' Amazon Textract.
The available actions are limited to analysing a document and detecting text.
See: Actions - Amazon Textract
I know this is an old post but I am working on a project to do exactly this. You can look at this Hugging Face model and the referenced model in Github:
https://huggingface.co/docs/transformers/model_doc/layoutlmv2
It isn't simple but it's the only open source solution I know of.

Amazon Textract vs Amazon Rekognition DetectText

How do I decide when to use Amazon Textract vs Amazon Rekognition's TextDetect method?
My usecase is click picture from mobile and convert image data into text and store into AWS RDS.
https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
https://aws.amazon.com/textract/
With respect to end-to-end problem solving, Textract will perform better because it is more fully featured for OCR. If you're simply trying to pull a line or two of text from a picture shot in the wild, like street signs or billboards, (ie: not a document or form) I'd recommend Amazon Rekognition.
Amazon Textract is a newer AWS service that was created as a purpose-built solution to the problem of OCR (optical character recognition) in images of documents and PDFs. While Rekognition is a more generalizable computer vision service, Textract has many more OCR-oriented tuning parameters to optimize the process of accurately and effectively extracting text.
Out of the box, if all you are trying to do is detect text and the relevant metadata (coordinates, angle, confidence value), the Rekognition DetectText method will likely perform similarly to the equivalent analyze_document method in Textract, however Textract offers further semantic structuring that helps with text curation/formatting that abstracts other forms of post-processing that the developer would traditionally need to write themselves.
Lastly, when comparing the costs of the two Detect Text methods, Textract costs a bit more ($1.50/1k images) compared to Rekognition ($1.00/1k images).
If there is simply random text in the picture, then use Amazon Rekognition. It will find text in any location.
Amazon Textract is designed for converting paper documents into organized data. It will probably not work well with a random picture (although I haven't tried it so I can't be certain!).

aws sagemaker for detecting text in an image

I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.
I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.
The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.
Here are a few options:
Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.
I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!

In dialogflow, how to upload file containing questions and answers programmatically?

Can we upload a training data (in .txt) using Python code in Dialogflow or Google cloud platform using Detect intent and Agent API ? If so, please share your insights.
You can look at using a PUT request to add additional training data to your intents. However, there is not a direct option to upload a text file. Generally Dialogflow does a really good job of interpreting the user's intent with just a handful of training samples, making it feasible to type each in manually or copy & paste. As it uses machine learning to match similar phrases, it shouldn't be necessary to upload a large text file.
Yes, you for training phrases you can upload one .txt file(one line per phrase) or multiple .txt zipped archive files(there's a limit of 10).
There's more on this here in the docs.