aws sagemaker for detecting text in an image - amazon-web-services

I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.
I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.

The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.
Here are a few options:
Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.
I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!

Related

Clarification on Default SageMaker Distribution Strategy

Context: When using SageMaker distributed training: Let’s say when training a network I do not provide any distribution parameter (keep it to default), but provide 2 instances for the instance_count value in the estimator (could be any deep learning based estimator, e.g., PyTorch).
In this scenario would there be any distributed training taking place? If so, what strategy is used by default?
NOTE: I could see both instances’ GPUs are actively used but wondering what sort of distributed training take place by default ?
If you're using custom code (custom Docker, custom code in Framework container) The answer is NO. Unless you are writing distributed code (Horovod, PyTorch DDP, MPI...), SageMaker will not distribute things for you. It will launch the same Docker or Python code N times, once per instance. Consider SageMaker Training API like a whiteboard, that can create multiple connected and configured machines for you. But the code is still yours to write. SageMaker Distributed Training Libraries can make distributed code much easier to write though.
If you're using a built-in algorithm, the answer is it depends. Some SageMaker built-in algorithms natively are multi-machine, like SM XGBoost or SM Random Cut Forest.

AWS Rekognition vs AWS visual search

Currently, I need to implement searching products by image on my app. As doing research, I wanna go for aws rekognition. So when the model predicts the image, I can pass the predicted label to query products by my api. This is what I plan to do. However, I also come across aws visual search (using aws sageMaker) which is way beyond my understanding. So, am I on the right way to implement it by using the first option (aws rekognition ) ???
Amazon Rekognition is 'out-of-the-box' image recognition. It can label pictures, find faces, read text, etc. It accepts custom labels, however it is not possible to modify the general recognition process.
Amazon SageMaker is a machine learning platform for building your own models. It is highly flexible, for everything from image recognition through to predictive analytics. However, it is quite complex and is usually used by Data Scientists.
Given your knowledge levels, Amazon Rekognition would be a better choice for you.

AWS Sagemaker - using cross validation instead of dedicated validation set?

When I train my model locally I use a 20% test set and then cross validation. Sagameker seems like it needs a dedicated valdiation set (at least in the tutorials I've followed). Currently I have 20% test, 10% validation leaving 70% to train - so I lose 10% of my training data compared to when I train locally, and there is some performance loss as a results of this.
I could just take my locally trained models and overwrite the sagemaker models stored in s3, but that seems like a bit of a work around. Is there a way to use Sagemaker without having to have a dedicated validation set?
Thanks
SageMaker seems to allow a single training set while in cross validation you iterate between for example 5 different training set each one validated on a different hold out set. So it seems that SageMaker training service is not well suited for cross validation. Of course cross validation is usually useful with small (to be accurate low variance) data, so in those cases you can set the training infrastructure to local (so it doesn't take a lot of time) and then iterate manually to achieve cross validation functionality. But it's not something out of the box.
Sorry, can you please elaborate which tutorials you are referring to when you say "SageMaker seems like it needs a dedicated validation set (at least in the tutorials I've followed)."
SageMaker training exposes the ability to separate datasets into "channels" so you can separate your dataset in whichever way you please.
See here for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-running-container.html#your-algorithms-training-algo-running-container-trainingdata

Amazon Textract vs Amazon Rekognition DetectText

How do I decide when to use Amazon Textract vs Amazon Rekognition's TextDetect method?
My usecase is click picture from mobile and convert image data into text and store into AWS RDS.
https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/
https://aws.amazon.com/textract/
With respect to end-to-end problem solving, Textract will perform better because it is more fully featured for OCR. If you're simply trying to pull a line or two of text from a picture shot in the wild, like street signs or billboards, (ie: not a document or form) I'd recommend Amazon Rekognition.
Amazon Textract is a newer AWS service that was created as a purpose-built solution to the problem of OCR (optical character recognition) in images of documents and PDFs. While Rekognition is a more generalizable computer vision service, Textract has many more OCR-oriented tuning parameters to optimize the process of accurately and effectively extracting text.
Out of the box, if all you are trying to do is detect text and the relevant metadata (coordinates, angle, confidence value), the Rekognition DetectText method will likely perform similarly to the equivalent analyze_document method in Textract, however Textract offers further semantic structuring that helps with text curation/formatting that abstracts other forms of post-processing that the developer would traditionally need to write themselves.
Lastly, when comparing the costs of the two Detect Text methods, Textract costs a bit more ($1.50/1k images) compared to Rekognition ($1.00/1k images).
If there is simply random text in the picture, then use Amazon Rekognition. It will find text in any location.
Amazon Textract is designed for converting paper documents into organized data. It will probably not work well with a random picture (although I haven't tried it so I can't be certain!).

How to make my datalab machine learning run faster

I got some data, which is 3.2 million entries in a csv file. I'm trying to use CNN estimator in tensorflow to train the model, but it's very slow. Everytime I run the script, it got stuck, like the webpage(localhost) just refuse to respond anymore. Any recommendations? (I've tried with 22 CPUs and I can't increase it anymore)
Can I just run it and use a thread, like the command line python xxx.py & to keep the process going? And then go back to check after some time?
Google offers serverless machine learning with TensorFlow for precisely this reason. It is called Cloud ML Engine. Your workflow would basically look like this:
Develop the program to train your neural network on a small dataset that can fit in memory (iron out the bugs, make sure it works the way you want)
Upload your full data set to the cloud (Google Cloud Storage or BigQuery or &c.) (documentation reference: training steps)
Submit a package containing your training program to ML Cloud (this will point to the location of your full data set in the cloud) (documentation reference: packaging the trainer)
Start a training job in the cloud; this is serverless, so it will take care of scaling to as many machines as necessary, without you having to deal with setting up a cluster, &c. (documentation reference: submitting training jobs).
You can use this workflow to train neural networks on massive data sets - particularly useful for image recognition.
If this is a little too much information, or if this is part of a workflow that you'll be doing a lot and you want to get a stronger handle on it, Coursera offers a course on Serverless Machine Learning with Tensorflow. (I have taken it, and was really impressed with the quality of the Google Cloud offerings on Coursera.)
I am sorry for answering even though I am completely igonorant to what datalab is, but have you tried batching?
I am not aware if it is possible in this scenario, but insert maybe only 10 000 entries in one go and do this in so many batches that eventually all entries have been inputted?