Why would Computer Vision recognize more text when submitting a subset of the image? - computer-vision

So I am attempting to use Azure Computer Vision OCR to recognize text in a jpg image. The image is about 2000x3000 pixels and is a picture of a contract. I want to get all the text and the bounding boxes. The image DPI is over 300 and it's quality is very clear. I noticed that a lot of text was being skipped so I cropped a section of the image and submitted that instead. This time it recognized text that it did not recognize before. Why would it do this? If the quality of the image never changed and the image was within the bounds of the resolution requirements, why is it skipping texts?

Related

Text Recognition through AWS Rekognition Fails to Detect Majority of Text

I am using AWS Rekognition to detect text from a pdf that is converted into a jpeg.
The image that I am using has text that is approximately size 10-12 or a regular letter page. However, The font changes throughout the image several times.
Is my lack of detection and low confidence levels due to having a document where the text changes often? Small Font?
Essentially I'd like to know what kind of image/text do I need to have the best results from a detect text algorithm?
DetectText API
can detect up to 50 words in an image
and to be detected, text must be within +/- 30 degrees orientation of the
horizontal axis.
and you are trying to extract a page full of text, that's the problem :)
AWS now provides AWS Textract service that is specifically intended for OCR purposes from images and documents.

Capture a specific location of an image using OpenCV

I am trying to organize my trading card collection digitally and am working on building a scanner using ocr to detect the names of my collection.
I need to use a webcam to snap a single image of each card in question. Snapping the image doesn't seem to be to difficult, but I need help determining how to get OpenCV to capture only a specific part of that image for OCR to work with. I'm trying to capture just the text portion of the image so that the artwork on the cards doesn't interfere with the OCR.
If my card will be placed in the same physical location each time, is there a way to get OpenCV to take an image and focus on just the area of the image that I'm interested in.
Thank You
Sour Jack
I am not sure I understand the problem. Do you want to use your OCR algorithm always on the same portion of the snapshot? If so, you can try something like:
roi = img[y:y+height, x:x+width]
There is more information here: http://answers.opencv.org/question/29260/how-to-save-a-rectangular-roi/

How to make motion history image for presentation into one single image?

I am working on a project with gesture recognition. Now I want to prepare a presentation in which I can only show images. I have a series of images defining a gesture, and I want to show them in a single image just like motion history images are shown in literature.
My question is simple, which functions in opencv can I use to make a motion history image using lets say 10 or more images defining the motion of hand.
As an example I have the following image, and I want to show hand's location (opacity directly dependent on time reference).
I tried using GIMP to merge layers with different opacity to do the same thing, however the output is not good.
You could use cv::updateMotionHistory
Actually OpenCV also demonstrates the usage in samples/c/motempl.c

How to correct RGB color taken with camera?

I had this question in my mind lately: I have taken a photo of a picture in my computer's display using my phone's camera (2MP) then transferred the picture to my computer. What i have noticed is that the individual pixel (RGB) values of the photographed image are different from the original picture (which is obvious !) but the color looks the same; so what i intend to do is modify the photographed image so that the pixel color values (RGB) are the same as the original (100% if possible), in other words make every pixel identical to the original picture ! without making use of the original picture.
I do not know if this is possible or not but any help will be extremely appreciated, I'm using Visual c++ 2005 with CIMG Library for processing images !
Thanks in advance !
I'm more interested in WHY you photograph your own screen? If you are running windows you can just press the "Print Screen" key on your keyboard and then open, for example Paint and paste the image.
The colour difference is there because you have not white balanced your camera. Even the most mobile phone cameras can white balance. Take a picture filled entirely with white on your screen. Then tell the camera to use it as a reference for white. Take your photo and it should be correct.

C++ Library for image recognition: images containing words to string

Does anyone know of a c++ library for taking an image and performing image recognition on it such that it can find letters based on a given font and/or font height? Even one that doesn't let you select a font would be nice (eg: readLetters(Image image).
I've been looking into this a lot lately. Your best is simply Tesseract. If you need layout analysis on top of the OCR than go with Ocropus (which in turn uses Tesseract to do the OCR). Layout analysis refers to being able to detect position of text on the image and do things like line segmentation, block segmentation, etc.
I've found some really good tips through experimentation with Tesseract that are worth sharing. Basically I had to do a lot of preprocessing for the image.
Upsize/Downsize your input image to 300 dpi.
Remove color from the image. Grey scale is good. I actually used a dither threshold and made my input black and white.
Cut out unnecessary junk from your image.
For all three above I used netbpm (a set of image manipulation tools for unix) to get to point where I was getting pretty much 100 percent accuracy for what I needed.
If you have a highly customized font and go with tesseract alone you have to "Train" the system -- basically you have to feed a bunch of training data. This is well documented on the tesseract-ocr site. You essentially create a new "language" for your font and pass it in with the -l parameter.
The other training mechanism I found was with Ocropus using nueral net (bpnet) training. It requires a lot of input data to build a good statistical model.
In terms of invoking Tesseract/Ocropus are both C++. It won't be as simple as ReadLines(Image) but there is an API you can check out. You can also invoke via command line.
While I cannot recommend one in particular, the term you are looking for is OCR (Optical Character Recognition).
There is tesseract-ocr which is a professional library to do this.
From there web site
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available
I think what you want is Conjecture. Used to be the libgocr project. I haven't used it for a few years but it used to be very reliable if you set up a key.
The Tesseract OCR library gives pretty accurate results, its a C and C++ library.
My initial results were around 80% accurate, but applying pre-processing on the images before supplying in for OCR the results were around 95% accurate.
What is pre-preprocessing:
1) Binarize the bitmap (B&W worked better for me). How it could be done
2) Resampling your image to 300 dpi
3) Save your image in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.