Searching jpeg/bmp/pdf image for straight lines, circles and text - c++

I want to create an image parser that shall read an image having following:
1. Straight Lines
2. Circles
3. Arcs
4. Text
I am open for solutions for any type of image format either jpeg, bmp, or PDF format.
I have seen QImage documentation. It shall provide me with pixel data that I can store in the form of a 2D matrix. At the moment I shall assume that there are only two colours black and white. White represents empty pixel and black represents a drawn pixel.
So I will have a sparse matrix like
0 1 1 1 0 0 0
0 0 0 0 0 0 1
0 1 1 0 0 0 1
1 0 0 1 0 0 1
1 0 0 1 0 0 0
0 1 1 0 0 0 0
Now I want to decode this matrix and search for the elements. Searching for horizontal and vertical lines is easy because for each element I can just scan its neighbouring row elements and column elements.
How can I search for other elements (angled lines, circles, arcs and possibly text)?
For text I read that QImage has text() function but I don't know for what type of input file it works.
Is there any other library that I can consider?
Please note that I just want to be able to read the image, processing does not need to be done.
Is there any other way I can accomplish this? Or am I being too ambitious?
Thanks

Take a look at the OpenCV library.
It provides most of the standard algorithms used in image detection and vision and the code quality of its implementation is quite high in general.
Notice though that this is a very difficult problem in general, so you will probably need to do a fair amount of research before getting satisfactory solutions.

One interesting way of tackling this would be with machine learning systems, such as neural networks and genetic algorithms. Neural nets in particular are very good at pattern matching and are often seen being used for tasks such as handwriting recognition.
There's a lot of information on this if you search for it. Here's one such article that is an introduction to NNs.
If your input images are always black and white, I don't think it would be too difficult to adapt a code example to get it working.

I suggest Viola-Jones object detection algorithm.
Though the approach is usually implemented on face detection - the original article discusses general object detection, such as your text, circles and lines.

Related

How can I fill gaps in a binary image in OpenCV?

I have some thresholded images of hand-drawn figures (circuits) but there are some parts where I need to have gaps closed between two points, as I show in the following image:
Binary image
I tried closing (dilation followed by erosion), but it is not working. It doesn't fill the gaps and makes the resistors and other components unrecognizable. I couldn't find a proper value for the morph size and number of iterations that give me a good result without affecting the rest of the picture. It's important not to affect too much the components.
I can't use hough lines because the gaps are not always in lines.
Result after closing:
Result after closing
int morph_size1 = 2;
Mat element1 = getStructuringElement(MORPH_RECT, Size(2 * morph_size1 + 1, 2 * morph_size1 + 1), Point(morph_size1, morph_size1));
Mat dst1; // result matrix
for (int i = 1; i<3; i++)
{
morphologyEx(binary, dst1, CV_MOP_CLOSE, element1, Point(-1, -1), i);
}
imshow("closing ", dst1);
Any idea?
Thanks in advance.
My proposal:
find the endpoints of the breaks by means of morphological thinning (select the white pixels having only one white neighbor);
in small neighborhoods around every endpoint, find the closest endpoint, by circling* up to a limit radius;
draw a thick segment between them.
*In this step, it is very important to look for neighbors in different connected component, to avoid linking a piece to itself; so you need blob labelling as well.
In this thinning, there are more breaks than in your original picture because I erased the boxes.
Of course, you draw the filling segments in the original image.
This process cannot be perfect, as sometimes endpoints will be missing, and sometimes unwanted endpoints will be considered.
As a refinement, you can try and estimate the direction at endpoints, and only search is an angular sector.
My suggestion is to use a custom convolution filter (cv::filter2D) like the one below (can be larger):
0 0 1/12 0 0
0 0 2/12 0 0
1/12 2/12 0 2/12 1/12
0 0 2/12 0 0
0 0 1/12 0 0
The idea is to fill gaps when there are two line segments near each other. You can also use custom structuring elements to obtain the same effect.

Given 2d array of 0s and 1s, find all the squares in it using backtracking

in this 2d array 1 represents a point and 0 represents blank area.
for example this array:
1 0 0 0 1
0 0 1 0 0
0 0 0 0 0
0 0 0 0 1
my answer should be 2, because there are 2 squares (or rectangles) in this array like this
all the points should be used, and you can't make another square | rectangle if all its points are already used (like we can't make another square from the point in the middle to the point in the top right) because they are both already used in other squares, you can use any point multiple times just if at least one corner is not used point.
I could solve it as an implementation problem, but I am not understanding how backtracking is related to this problem.
thanks in advance.
Backtracking, lets take a look at another possible answer to your problem, you listed:
{0,0} to (2,1}
{0,0} to {4,0}
As one solution another solution is (With respect to the point can be used multiple times as long as one point is unused):
{4,0} to {2,1} (first time 4,0 and 2,1 is used)
{0,0} to {2,1} (first time 0,0 is used)
{0,0} to {4,4} (first time 4,4 is used)
Which is 3 moves, with backtracking it is designed to show you alternative results using recursion. In this equation if you start the starting location for calculating the squares at different areas of the array you can achieve different results.
for an example iterating starts from 0,0, and going right across each row trying to find all possible rectangles starting with [0,0] will give the solution you provided, iteratings starting from 4,0 and going left across each row trying to find all possible solutions will give my result.

14 segment display and Tesseract OCR with OpenCV

I am using OpenCV 2.4 and Tesseract 3
I am trying to do an OCR on a 14-segment display from a webcam.
The issue is that when I trained Tesseract, I had to do enough erosion/dilation to fill the gaps of each segments. But, the image I am reading from the webcam needs to be pre-processed to remove noises. To do this, I use erosions and dilations and the resulting picture doesn't have its segments linked :
What I trained tesseract with (that's the "V" letter) : http://i.imgur.com/NbmVqkb.png (segments are all linked)
What I feed tesseract with : http://i.imgur.com/0E4iXXk.png (some segments are linked, some aren't)
The result of OCR-ing is always different and can be "OVO" as well as "EB". I thought that maybe if I trained tesseract with a more similar version of what I am actually reading (non-linked segments) it could work better but Tesseract can't be trained with blank spaces like this (it says "Empty page").
Does anyone have any idea on how to solve this ?
I tried to increase the size of erosion/dilation but then other letters aren't recognized (B and D are confusing) and overall results is lower.
Thank you !
EDIT : Basically, what I'd need is a way to link the segments together to make it easier for tesseract to read the character OR a way to train tesseract with unlinked segments (from what I've seen, that can't happen)
Isn't it possible to skip tessaract for this? It looks like you already have a way of partitioning your image into separate characters. Then you could number the segments of your display, perhaps like it is shown here http://www.randomdata.nl/wiki/index.php/Adruino_14_segment_LED_board and just decide which of your segments are currently lighting up. Then you can match that against the known combinations of segments lighting up for all characters with some form of nearest distance algorithm to find the best match.
Sticking to the scheme linked above your V could perhaps be encoded as follows:
segment number: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
switched on: 0 1 1 0 0 0 1 0 1 0 0 0 0 0

Compressing BMP methods

I am working on a project to losslessly compress a specific style of BMP images that look like this
I have thought about doing pattern recognition, to find repetitive blocks of N x N pixels but I feel like it wont be fast enough execution time.
Any suggestions?
EDIT: I have access to the dataset that created these images too, I just use the image to visualize my data.
Optical illusions make it hard to tell for sure but are the colors only black/blue/red/green? If so, the most straightforward compression would be to simply make more efficient use of pixels. I'm thinking pixels use a fixed amount of space regardless of what color they are. Thus, chances are you are using 12x as many pixels as you really need to be. Since a pixel can be a lot more colors than just those four.
A simple way to do that would be to do label the pixels with the following base 4 numbers:
Black = 0
Red = 1
Green = 2
Blue = 3
Example:
The first four colors of the image seems to be Blue-Red-Blue-Blue. This is equal to 3233 in base 4, which is simply EF in base 16 or 239 in base 10. This is enough to define what the red color of the new pixel should be. The next 4 would define the green color and the final 4 define what the blue color is. Thus turning 12 pixels into a single pixel.
Beyond that you'll probably want to look into more conventional compression software.

Error control coding for a practical application

I’m doing a project where a device is built to measure the girth of a rubber tree in a rubber plantation.
I need to give an identity to each tree to store the measurements of each tree.
The ID of each tree contains 33bits (in binary). For error detection and correction I’m hoping to encode this 33bit word in to a code word (Using a error control coding technique) and generate a 2D matrix (Color matrix-With red and cyan squares representing 1’s and 0’s). The 2D matrix will represent the coded word. This matrix will be pasted on the trunk of the tree. And a camera (the device) will be used to take the image of the 2D matrix and the code will be decoded then the ID of the tree will be taken.
I’m looking for the best scheme to implement this. I thought of cyclic codes for coding, but since the data word is 33 bits cyclic codes seems to be bit complicated.
Can someone please suggest the best way (at least a good way) to implement this???
Additional info: The image is taken in a forest environment (low light condition), Color matrix is to be used considering the environment.(The bark of the tree is dark so black and white matrix would be not appropriate)
One way to do it is to use 2D-parity check codes. The resulted codeword is a matrix, and it has single error correction (SEC) capability.
Since your information part (tree ID) has 33 bits, you may need to add a few dummy bits to make the codeword a 2D rectangle (say, a 6x6 information part). If a tree's ID is 1010_1010_1010_1010_1010_1010_1010_1010_1, then by adding 3 more 0 we have it as:
1 0 1 0 1 0 | 1
1 0 1 0 1 0 | 1
1 0 1 0 1 0 | 1
1 0 1 0 1 0 | 1
1 0 1 0 1 0 | 1
1 0 1 0 0 0 | 0
—————————————
0 0 0 0 1 0 1
Then you get a (n, k, d) = (49, 36, 3) code, which correct 1 bit errors.