Basic text detection API (e.g. google) does not return anything for the following image. To try Google's vision API, save the image locally and run:
gcloud ml vision detect-text <local-path-to-image> | grep description
It may return gibberish. The text we want is RAW9405. Are there any existing models for this or does it require training?
What you can do is use craft-text-detector which is available opensource, you will get the bounding box coordinates for every single word and based on y-axis you can form a sentence than use tesseract for recognition.
Related
I have uploaded an image in my Google Cloud Storage inside a bucket. Then I open the gcloud command line and I run the following:
gcloud ml vision detect-faces gs://my-bucket/face.png
I can see the result in json format, so I understand that it returns the position of the face and some face features.
How can I save/export a cropped image containing just the face, using the gcloud command line?
In other words, in the below example, how can I export as a separate image the area within the larger green box?
Using face detection API, you get back a rectangle that identifies the found face. If what you now want is to create a new image (which could replace the old image if desired) containing just the face, you can use a tool like ImageMagick. This can be command line executed. This tool takes as input a source image, commands and generates a new image. One of the commands is -crop which will crop an image given an input rectangle (the face box).
So I am attempting to use Azure Computer Vision OCR to recognize text in a jpg image. The image is about 2000x3000 pixels and is a picture of a contract. I want to get all the text and the bounding boxes. The image DPI is over 300 and it's quality is very clear. I noticed that a lot of text was being skipped so I cropped a section of the image and submitted that instead. This time it recognized text that it did not recognize before. Why would it do this? If the quality of the image never changed and the image was within the bounds of the resolution requirements, why is it skipping texts?
I am using AWS Rekognition to detect text from a pdf that is converted into a jpeg.
The image that I am using has text that is approximately size 10-12 or a regular letter page. However, The font changes throughout the image several times.
Is my lack of detection and low confidence levels due to having a document where the text changes often? Small Font?
Essentially I'd like to know what kind of image/text do I need to have the best results from a detect text algorithm?
DetectText API
can detect up to 50 words in an image
and to be detected, text must be within +/- 30 degrees orientation of the
horizontal axis.
and you are trying to extract a page full of text, that's the problem :)
AWS now provides AWS Textract service that is specifically intended for OCR purposes from images and documents.
I wish to use Google Cloud Vision API to generate features from images, that I will further use to train my SVM for emotion recognition problem. Please provide a detailed procedure for how to write a script in python that can use Google Cloud Vision API to generate features that I can directly feed into SVM.
I would go with following steps:
Training
Create a dataset(training + testing) for whichever emotions you want(such as anger, happy, etc.). This dataset must be diverse but balanced in terms of gender and age.
Extract the features of each face.
Normalize the whole dataset. Get the bounding box around faces and cut them from images. Also, normalize the sizes of each face.
Align the faces by using Roll and Eye coordinates which can be acquired from Google API.
Train an SVM(validate it, etc).
Testing
Acquire an image.
Extract the features.
Normalize and align the face.
Use SVM.
Library that I suggest:
scikit-learn - SVM
OpenCV - Image Manipulations
I know there is a lot of vision recognition APIs such as Clarifai, Watson, Google Cloud Vision, Microsoft Cognitive Services which provide recognition of image content. The response of these services is simple json that contains different tags, for example
{
man: 0.9969295263290405,
portrait: 0.9949591159820557,
face: 0.9261120557785034
}
The problem is that I need to know not only what is on the image but also the position of that object. Some of those APIs have such feature but only for face detection.
So does anyone know if there is such API or I need to train own haar cascades on OpenCV for every object.
I will be very greatful for sharing some info.
You could take a look at Wolfram Cloud/Mathematica.
It has the ability to detect object locations in a picture.
Some examples.
Detecting road signs.
Finding Waldo.
Object tracking in video.