How do I decide when to use Amazon Textract vs Amazon Rekognition's TextDetect method?
My usecase is click picture from mobile and convert image data into text and store into AWS RDS.
https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
https://aws.amazon.com/textract/
With respect to end-to-end problem solving, Textract will perform better because it is more fully featured for OCR. If you're simply trying to pull a line or two of text from a picture shot in the wild, like street signs or billboards, (ie: not a document or form) I'd recommend Amazon Rekognition.
Amazon Textract is a newer AWS service that was created as a purpose-built solution to the problem of OCR (optical character recognition) in images of documents and PDFs. While Rekognition is a more generalizable computer vision service, Textract has many more OCR-oriented tuning parameters to optimize the process of accurately and effectively extracting text.
Out of the box, if all you are trying to do is detect text and the relevant metadata (coordinates, angle, confidence value), the Rekognition DetectText method will likely perform similarly to the equivalent analyze_document method in Textract, however Textract offers further semantic structuring that helps with text curation/formatting that abstracts other forms of post-processing that the developer would traditionally need to write themselves.
Lastly, when comparing the costs of the two Detect Text methods, Textract costs a bit more ($1.50/1k images) compared to Rekognition ($1.00/1k images).
If there is simply random text in the picture, then use Amazon Rekognition. It will find text in any location.
Amazon Textract is designed for converting paper documents into organized data. It will probably not work well with a random picture (although I haven't tried it so I can't be certain!).
Related
Currently, I need to implement searching products by image on my app. As doing research, I wanna go for aws rekognition. So when the model predicts the image, I can pass the predicted label to query products by my api. This is what I plan to do. However, I also come across aws visual search (using aws sageMaker) which is way beyond my understanding. So, am I on the right way to implement it by using the first option (aws rekognition ) ???
Amazon Rekognition is 'out-of-the-box' image recognition. It can label pictures, find faces, read text, etc. It accepts custom labels, however it is not possible to modify the general recognition process.
Amazon SageMaker is a machine learning platform for building your own models. It is highly flexible, for everything from image recognition through to predictive analytics. However, it is quite complex and is usually used by Data Scientists.
Given your knowledge levels, Amazon Rekognition would be a better choice for you.
I need to process a large quantity of multipage pdfs (around 23,000 documents and an average of 30 pages) into text. Since the documents are typewritten and scanned I want to use OCR recognition to avoid characters recognition mistakes. The problem is the estimated running time on R (using the Tesseract package) is crazy. Is there an online service provider that can be used for this task?
N.B. I had a look both at Amazon Web Service and Google Cloud, but is extremely difficult for me to understand how to use them, especially how to automate the whole process
I need to transcribe a large number of Handwritten documents. I tried to use cloud services from either Google, Amazon, and Microsoft. Namely:
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
https://cloud.google.com/vision/docs/handwriting
https://aws.amazon.com/textract/
Unfortunately, none of them achieved good enough results. I suspect it is because my documents have a weird handwriting style, and as a result, the networks struggle a lot.
I searched whether I could fine-tune (with manually transcribed data), but I have not found anything online, so as a last resort, I ask here.
If it is possible to fine-tune one of these models, could you please point me some resources?
You are correct, with Azure Cognitive Services with Computer Vision you cannot upload your own data to train the API to recognise the handwriting in your documents I'm afraid. I can't comment on the other offerings from AWS and Google I'm afraid, but certainly not for Azure.
I am fairly new to AWS Comprehend. I know that AWS Comprehend can custom classify documents (Text Files). Does, AWS Comprehend also classify Image files? Also, while training the model, is it necessary to give the entire document text in the CSV or will just keywords do?
The reason being, I want to built a custom classifier that can classify invoice, Pay Stubs and few other such document types which are in image formats. Can Comprehend do this? If so how?
Googled quite a lot but couldn't find anything much relevant around. Really appreciate your help with this.
Thank you!
Comprehend doesn't do this natively, so you would have to build a solution. Something you could try is to combine Amazon Textract (for extracting the details from the documents) and then Comprehend to classify them.
From the FAQ, Textract calls out this as a common use case. I couldn't find an exact example of someone doing this, but it is directly called out in the documentation.
Amazon Comprehend only works on text.
Amazon Rekognition works on images.
AWS has all the building blocks to accomplish this, but you will have to configure/build this yourself. You can use AWS Textract to extract all the text from a document, and then pass the text into the AWS Comprehend service to do the classification for document type.
Before you can do this you need to train the machine learning part of Comprehend to do the correct identification of the document types. You need to configure and train a custom classifier in AWS Comprehend where you supply a CSV file with a list of classifications for example 'document type' and then text that would be in the type of document. If it is just forms then you can use Textract Form feature to only get key value pairs, then use the keys (labels in the form) as text for the custom classifier.
I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.
I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.
The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.
Here are a few options:
Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.
I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!