Rekognition for numbers aligned vertically? - amazon-web-services

Rekognition does great with traditional horizontally aligned numbers but doesn't work well when numbers are vertically aligned (top to bottom). Can anyone think of a way to use Rekognition for vertically aligned numbers?
I've tried cropping the image and rotating it, but with same poor results.
I use python but doesn't really matter since Rekognition is doing the work internally. (see attached example, seems very clear to me and would work perfectly if numbers were aligned horizontally.)

You would need to write some code that looks at the Bounding Boxes of the returned text and makes some assumptions of how they align.
For example:
If the bounding boxes mostly overlap horizontally
Sort them in order of the vertical position
Confirm that the vertical spacing is within a given distance
Concatenate them together into one string
Update: I tried putting your sample image into Rekognition and it didn't detect the numbers. However, when I cropped the image to a smaller section, it successfully detected the numbers. It also provided them back in "top-down" order.

Related

Motion detection by eliminating constant movements

I am trying to implement a motion detection in OpenCV C++. I tried various methods like MOG, Optical flow which work fine but is there a way we can eliminate constant movements in the scene like a constant fan motion etc ? I have opencv accumuateWeighted() in mind but not sure if it works. Is there any better way we can do it ?
I have not got full robust solution and also i don't have any experience with video processing but i would put my idea whatever till now i have got in to this problem:
First consider a few pairs of consecutive image frames from the video and convert them to gray scale for more robust comparison.
Raster scan the image pairs and find the difference of image pairs by comparing corresponding pairs.
The resultant image will give the pixel location where there is a change in image to image in a pair, cluster these pixels locations and make a bounding box over them. So that this bounding box region will mark an object which is translating/rotation.
Now as we have applied the above image difference operation over several pairs. We will have rotating/translating bounding box in each image pair difference.
Now check in each resultant image difference with pixels having bounding box over them.
Compare bounding box central location in a difference image with other difference images. If bounding box with a very slight variation in its central location exists across all difference images then object contained in that bounding box will be having rotational motion like Fan,leaves and remaining bounding boxes will represent the actual translating objects in the video.

opencv - Contour Alignment and comparison

I have tried to use edge detection to find the contour of images and try to compare the similarity of the contours by matchshape function. However, results are not as good as expected. I think it may be because of the images are not aligned before calculating the similarity. Therefore, I am asking for a way of aligning two contours in opencv. I am thinking of aligning by first finding the smallest bounding box or circle and then find out translation, rotation or resize needed to align those boxes. Then apply those transformation on the contour and test the similarity of them. Does this method work? Is there any method to align images? Thanks for your help. For your reference, attached are two contours going to be tested. They should be very similar but the distance found is quite large. The first two images have larger distance than that between the first and the last one, which seems contradicts with what it looks like (the last one should be the worst). Thanks.
These kinds of problems are known as registration problems. CPD, BCPD, and ICP would be your best shot.
[https://github.com/neka-nat/probreg][1]

Python: Reduce rectangles on images to their border

I have many grayscale input images which contain several rectangles. Some of them overlap and some go over the border of the image. An example image could look like this:
Now i have to reduce the rectangles to their border. My idea was to make all non-white pixels which are less than N (e.g. 3) pixels away from the border or a white pixel (using the Manhatten distance) white. The output should look like this (sorry for the different-sized borders):
It is not very hard to implement this. Unfortunately the implementation must be fast, because the input may contain extremly many images (e.g. 100'000) and the user has to wait until this step is finished.
I thought about using fromimage and do then everything with numpy, but i did not find a good solution.
Maybe someone has an idea or a hint how this problem may be solved very efficient?
Calculate the distance transform of the image (opencv distanceTrasform http://docs.opencv.org/2.4/modules/imgproc/doc/miscellaneous_transformations.html)
In the resulted image zero all the pixels that have value bigger than 3

OpenCV C++ extract features from binary image

I have written an algorithm to process a camera capture and extract a binary image of two features I'm interested in. I'm trying to find the best (fastest) way of detecting when the two features intersect and where the lowest (y coordinate is greatest) point is (this will be the intersection).
I do not want to use a findContours() based method as this is too slow and, in my opinion, unnecessary. I also think blob detection libraries are too bloated for this.
I have two sample images (sorry for low quality):
(not touching: http://i.imgur.com/7bQ9qMo.jpg)
(touching: http://i.imgur.com/tuSmKw7.jpg)
Due to the way these images are created, there is often noise in the top right corner which looks like pixelated lines but methods such as dilation and erosion lose resolution around the features I'm trying to find.
My initial thought would be to use direct pixel access to form a width filter and a height filter. The lowest point in the image is therefore the intersection.
I have no idea how to detect when they touch... logically I can see that a triangle is formed when they intersect and otherwise there is no enclosed black area. Can I fill the image starting from the corner with say, red, and then calculate how much of the image is still black?
Does anyone have any suggestions?
Thanks
Your suggestion is a way more slow than finding contours. For binary images, finding contour is very easy and quick because you just need to find a black pixel followed by a white pixel or vice versa.
Anyway, if you don't want to use it, you can use the vertical projection or vertical profile you will see it the objects intersect or not.
For example, in the following image check the the letter "n" which is little similar to non-intersecting object, and the letter "o" which is similar to intersecting objects :
By analyzing the histograms you can recognize which one is intersecting or not.

How to detect Text Area from image?

i want to detect text area from image as a preprocessing step for tesseract OCR engine, the engine works well when the input is text only but when the input image contains Nontext content it falls, so i want to detect only text content in image,any idea of how to do that will be helpful,thanks.
Take a look at this bounding box technique demonstrated with OpenCV code:
Input:
Eroded:
Result:
Well, I'm not well-experienced in image processing, but I hope I could help you with my theoretical approach.
In most cases, text is forming parallel, horisontal rows, where the space between rows will contail lots of background pixels. This could be utilized to solve this problem.
So... if you compose every pixel columns in the image, you'll get a 1 pixel wide image as output. When the input image contains text, the output will be very likely to a periodic pattern, where dark areas are followed by brighter areas repeatedly. These "groups" of darker pixels will indicate the position of the text content, while the brighter "groups" will indicate the gaps between the individual rows.
You'll probably find that the brighter areas will be much smaller that the others. Text is much more generic than any other picture element, so it should be easy to separate.
You have to implement a procedure to detect these periodic recurrences. Once the script can determine that the input picture has these characteristics, there's a high chance that it contains text. (However, this approach can't distinguish between actual text and simple horisontal stripes...)
For the next step, you must find a way to determine the borderies of the paragraphs, using the above mentioned method. I'm thinking about a pretty dummy algorithm, witch would divide the input image into smaller, narrow stripes (50-100 px), and it'd check these areas separately. Then, it would compare these results to build a map of the possible areas filled with text. This method wouldn't be so accurate, but it probably doesn't bother the OCR system.
And finally, you need to use the text-map to run the OCR on the desired locations only.
On the other side, this method would fail if the input text is rotated more than ~3-5 degrees. There's another backdraw, beacuse if you have only a few rows, then your pattern-search will be very unreliable. More rows, more accuracy...
Regards, G.
I am new to stackoverflow.com, but I wrote an answer to a question similar to this one which may be useful to any readers who share this question. Whether or not the question is actually a duplicate, since this one was first, I'll leave up to others. If I should copy and paste that answer here, let me know. I also found this question first on google rather than the one i answered so this may benefit more people with a link. Especially since it provides different ways of going about getting text areas. For me, when I looked up this question, it did not fit my problem case.
Detect text area in an image using python and opencv
In the Current time, the best way to detect the text is by using EAST (An Efficient and Accurate Scene Text Detector)
The EAST pipeline is capable of predicting words and lines of text at arbitrary orientations on 720p images, and furthermore, can run at 13 FPS, according to the authors.
EAST quick start tutorial can be found here
EAST paper can be found here