I am using OpenCV 2.4 and Tesseract 3
I am trying to do an OCR on a 14-segment display from a webcam.
The issue is that when I trained Tesseract, I had to do enough erosion/dilation to fill the gaps of each segments. But, the image I am reading from the webcam needs to be pre-processed to remove noises. To do this, I use erosions and dilations and the resulting picture doesn't have its segments linked :
What I trained tesseract with (that's the "V" letter) : http://i.imgur.com/NbmVqkb.png (segments are all linked)
What I feed tesseract with : http://i.imgur.com/0E4iXXk.png (some segments are linked, some aren't)
The result of OCR-ing is always different and can be "OVO" as well as "EB". I thought that maybe if I trained tesseract with a more similar version of what I am actually reading (non-linked segments) it could work better but Tesseract can't be trained with blank spaces like this (it says "Empty page").
Does anyone have any idea on how to solve this ?
I tried to increase the size of erosion/dilation but then other letters aren't recognized (B and D are confusing) and overall results is lower.
Thank you !
EDIT : Basically, what I'd need is a way to link the segments together to make it easier for tesseract to read the character OR a way to train tesseract with unlinked segments (from what I've seen, that can't happen)
Isn't it possible to skip tessaract for this? It looks like you already have a way of partitioning your image into separate characters. Then you could number the segments of your display, perhaps like it is shown here http://www.randomdata.nl/wiki/index.php/Adruino_14_segment_LED_board and just decide which of your segments are currently lighting up. Then you can match that against the known combinations of segments lighting up for all characters with some form of nearest distance algorithm to find the best match.
Sticking to the scheme linked above your V could perhaps be encoded as follows:
segment number: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
switched on: 0 1 1 0 0 0 1 0 1 0 0 0 0 0
Related
I am still a beginner in coding. I am currently working on a program in C/C++ that is determining pixel position of a defined mark (which is a black circle with white surroundings) in a photo.
I made a mask from the mark and a vector, which contains mask's every pixel value as it's elements (using Magick++ I summed values for Red, Green and Blue). Vector contains aprox. 10 000 values since the mask is 100x100px. I also used threshold functions for simplifying the image.
Than I made a grid, that is doing the same for the picture, where I want to find the coordinates of the mark. It is basically a loop, that is going throught the image and when the program knows pixel values in the grid it immediately compares them with the mask. Main idea is to find lowest difference between the mask and one of the grid positions.
The problem is however that this procedure of evaluating all grids position takes huge amount of time (e.g. the image has 1920x1080px so more than 2 million vectors containing 10 000 values). I decided to cycle the grid not every pixel but for example every 10th column and row, and than for the best corellation from this procedure I selected area where I used every pixel loop. But, this still takes lot of time.
I would like to ask you, if there is some way of improving this method for better (faster) results or this whole idea is not time efficient and I should use different approach.
Thanks for every advice!
Edit: The program will be used for processing multiple images and on all of them the size will be same. This is the picture after threshold, the mark is the big black dot.
Image
The idea that I find interesting is a pyramidal scheme - or progressive refinement: you find the spot at a lower size image then search only a small rectangle in the larger image.
If you reduce your image by 2 in each dimension then you would reduce the time by 4 plus some search effort in the larger image.
This has some problems: the reduction will affect accuracy I expect. You might miss the spot.
You have to cut the sample (template) by the same so you create a half-size template in this case. As you half half half... the template will get blurred into the surrounding objects so it will not be possible to have a valid template; for half size once I guess the dot has a couple of pixels around it.
As you haven't specified a tool or OS, I will choose ImageMagick which is installed on most Linux distros and is available for OSX and Windows. I am just using it at the command-line here but there are C, C++, Python, Perl, PHP, Ruby, Java and .Net bindings available.
I would use a "Connect Components Analysis" or "Blob Analysis" like this:
convert image.png -negate \
-define connected-components:area-threshold=1200 \
-define connected-components:verbose=true \
-connected-components 8 -auto-level result.png
I have inverted your image with -negate because in morphological operations, the foreground is usually white rather than black. I have excluded blobs smaller than 1200 pixels because your circles seem to have a radius of 22 pixels which makes for an area of 1520 pixels (Pi * 22^2).
That gives this output, which means 7 blobs - one per line - with the bounding box and area of each:
Objects (id: bounding-box centroid area mean-color):
0: 1358x1032+0+0 640.8,517.0 1296947 gray(0)
3: 341x350+1017+287 1206.5,468.9 90143 gray(255)
106: 64x424+848+608 892.2,829.3 6854 gray(255)
95: 38x101+44+565 61.5,619.1 2619 gray(255)
49: 17x145+1341+379 1350.3,446.7 2063 gray(0)
64: 43x43+843+443 864.2,464.1 1451 gray(255)
86: 225x11+358+546 484.7,551.9 1379 gray(255)
Note that, as your circle is 42x42 pixels you will be looking for a blob that is square-ish and close to that size - so I am looking at the second to last line. I can draw that in in red on your original image like this:
convert image.png -fill none -stroke red -draw "rectangle 843,443 886,486" result.png
Also, note that as you are looking for a circle, you would expect the area to be pi * r^2 or around 1500 pixels and you can check that in the penultimate column of the output.
That runs in 0.4 seconds on a reasonable spec iMac. Note that you could divide the image into 4 and run each quarter in parallel to speed things up. So, if you do something like this:
#!/bin/bash
# Split image into 4 (maybe should allow 23 pixels overlap)
convert image.png -crop 1x4# tile-%02d.mpc
# Do Blob Analysis on 4 strips in parallel
for f in tile-*mpc; do
convert $f -negate \
-define connected-components:area-threshold=1200 \
-define connected-components:verbose=true \
-connected-components 8 info: &
done
# Wait for all 4 to finish
wait
That runs in around 0.14 seconds.
I want to create an image parser that shall read an image having following:
1. Straight Lines
2. Circles
3. Arcs
4. Text
I am open for solutions for any type of image format either jpeg, bmp, or PDF format.
I have seen QImage documentation. It shall provide me with pixel data that I can store in the form of a 2D matrix. At the moment I shall assume that there are only two colours black and white. White represents empty pixel and black represents a drawn pixel.
So I will have a sparse matrix like
0 1 1 1 0 0 0
0 0 0 0 0 0 1
0 1 1 0 0 0 1
1 0 0 1 0 0 1
1 0 0 1 0 0 0
0 1 1 0 0 0 0
Now I want to decode this matrix and search for the elements. Searching for horizontal and vertical lines is easy because for each element I can just scan its neighbouring row elements and column elements.
How can I search for other elements (angled lines, circles, arcs and possibly text)?
For text I read that QImage has text() function but I don't know for what type of input file it works.
Is there any other library that I can consider?
Please note that I just want to be able to read the image, processing does not need to be done.
Is there any other way I can accomplish this? Or am I being too ambitious?
Thanks
Take a look at the OpenCV library.
It provides most of the standard algorithms used in image detection and vision and the code quality of its implementation is quite high in general.
Notice though that this is a very difficult problem in general, so you will probably need to do a fair amount of research before getting satisfactory solutions.
One interesting way of tackling this would be with machine learning systems, such as neural networks and genetic algorithms. Neural nets in particular are very good at pattern matching and are often seen being used for tasks such as handwriting recognition.
There's a lot of information on this if you search for it. Here's one such article that is an introduction to NNs.
If your input images are always black and white, I don't think it would be too difficult to adapt a code example to get it working.
I suggest Viola-Jones object detection algorithm.
Though the approach is usually implemented on face detection - the original article discusses general object detection, such as your text, circles and lines.
I am working on a project to losslessly compress a specific style of BMP images that look like this
I have thought about doing pattern recognition, to find repetitive blocks of N x N pixels but I feel like it wont be fast enough execution time.
Any suggestions?
EDIT: I have access to the dataset that created these images too, I just use the image to visualize my data.
Optical illusions make it hard to tell for sure but are the colors only black/blue/red/green? If so, the most straightforward compression would be to simply make more efficient use of pixels. I'm thinking pixels use a fixed amount of space regardless of what color they are. Thus, chances are you are using 12x as many pixels as you really need to be. Since a pixel can be a lot more colors than just those four.
A simple way to do that would be to do label the pixels with the following base 4 numbers:
Black = 0
Red = 1
Green = 2
Blue = 3
Example:
The first four colors of the image seems to be Blue-Red-Blue-Blue. This is equal to 3233 in base 4, which is simply EF in base 16 or 239 in base 10. This is enough to define what the red color of the new pixel should be. The next 4 would define the green color and the final 4 define what the blue color is. Thus turning 12 pixels into a single pixel.
Beyond that you'll probably want to look into more conventional compression software.
I want to merge 2 images. How can i remove the same area between 2 images?
Can you tell me an algorithm to solve this problem. Thanks.
Two image are screenshoot image. They have the same width and image 1 always above image 2.
When two images have the same width and there is no X-offset at the left side this shouldn't be too difficult.
You should create two vectors of integer and store the CRC of each pixel row in the corresponding vector element. After doing this for both pictures you find the CRC of the first line of the lower image in the first vector. This is the offset in the upper picture. Then you check that all following CRCs from both pictures are identical. If not, you have to look up the next occurrence of the initial CRC in the upper image again.
After checking that the CRCs between both pictures are identical when you apply the offset you can use the bitblit function of your graphics format and build the composite picture.
I haven't come across something similar before but I think the following might work:
Convert both to grey-scale.
Enhance the contrast, the grey box might become white for example and the text would become more black. (This is just to increase the confidence in the next step)
Apply some threshold, converting the pictures to black and white.
afterwards, you could find the similar areas (and thus the offset of overlap) with a good degree of confidence. To find the similar parts, you could harper's method (which is good but I don't know how reliable it would be without the said filtering), or you could apply some DSP operation(s) like convolution.
Hope that helps.
If your images are same width and image 1 is always on top. I don't see how that hard could it be..
Just store the bytes of the last line of image 1.
from the first line to the last of the image 2, make this test :
If the current line of image 2 is not equal to the last line of image 1 -> continue
else -> break the loop
you have to define a new byte container for your new image :
Just store all the lines of image 1 + all the lines of image 2 that start at (the found line + 1).
What would make you sweat here is finding the libraries to manipulate all these data structures. But after a few linkage and documentation digging, you should be able to easily implement that.
I'm somewhat new to OpenGL though I'm fairly sure my problem lies in the pixel format being used, or how my texture is being generated...
I'm drawing a texture onto a flat 2D quad using a 16bit RGB5_A1 pixel format, though I don't make use of any alpha at this stage. The problem I'm having is that each pair of horizontal pixel values have been swapped.
That is... if the pixels positions should be in this order (assume 8x2 image)
0 1 2 3
4 5 6 7
they are instead drawn as
1 0 3 2
5 4 7 6
Or, more clearly from this image (below).
Left is what I get... Right is what I should get.
.
The question is... How have I ended up with this? Is there something wrong with the pixel format? Unlikely since the colours all appear correct, and I would expect all kinds of nasty if it were down to endian-ness. Suggestions greatly appreciated.
Update: Turns out the problem was in my source renderer. Interestingly, I've avoided the problem entirely by using 32-bit textures (haven't tried 24-bit at this point).
This may be unrelated, and you have found a workaround, but it could be related to OpenGL unpack alignment. Have you tried with the following call ? To instruct the alignment of every image row to 1 byte (default is 4).
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);