Sorry if this is a recurring question, but I can't find the right keywords for this search.
I need to develop a system for visual recognition of labels attached to products from a warehouse. I'm using a fixed focus camera, so the idea is to use a label with some code with 6 alphanumeric digits printed in a large font. Then this system would be responsible for performing ROI extraction and applying OCR to recognize the objects in the scene.
My main problem is the ROI extraction part. I tried to use template matching, but due to the difficulties with scale and rotation, it doesn't seem to be the right technique for the application. I also tried to use feature matching, but the results are still insufficient.
My question is how could I develop the label to facilitate ROI extraction? Could I use something like apriltags to simplify the homography?
Thanks in advance!
Related
I'm trying to use Microsoft's Computer Vision OCR API to get information from a table on an image. The trouble that I'm having is that the data returned typically has all sorts of qwerky regions going on and I'm attempting to piece all the regions together to get full lines of readable and parse-able text.
The only way I've thought of that makes any sense is to use the orientation to rotate the bounding box coordinates and check which "lines" are within a given percentage of the height of another given bounding box - perhaps 20% or so.
This is literally the only way I've thought of so far and I'm beginning to think I'm over complicating this; is there a standard way that people tend to build up OCR regions to get readable text?
There is no standard way as such. However, people do go with the option of REGEX, depending on the requirement.
Azure OCR returns the JSON Response as words and their bounding boxes. From there on, it is up to you to interpret the result. The ocr apis do not help with this task.
As a start, regex is a great way to parse text data. Or try a machine learning approach as described in this reddit post: https://www.reddit.com/r/MachineLearning/comments/53ovp9/extracting_a_total_cost_from_ocr_paper_receipt/
I would like to detect labels with barcodes. These labels all look very similar but each label has a different barcode/wording on it.
I have tried template matching but to no avail.
One constraint that I am faced with is that there are other barcodes in the image, but I only want labels with this format of barcodes.
Could anyone hint me some other algorithms.
Thank you!
I don't have much experience with barcodes, but you might want to have a look at Canny Edge detector. It's quite simple to understand and should help you in your task.
If I had to do anything like this, I would first try to rotate the image until the lines are vertical, use canny and then extract the 'on/off' pattern of each set of bars.
This PyImageSearch page is a good starting point.
To reject codes of other formats, you can use a bar code reader api, or come up with some rules regarding the format.
I was asked to recognize logo in an image using opencv. The lecturer told me that I don't have to do logo detection but logo recognition only. I am using opencv in c++. Can I know the easiest way to do it??
Ps: newbie in computer vision.
It largely depends on your kind of images.
If your logo occupies say 90% of the image, you don't need detection, since you are probably good with color histograms.
If the logo is small compared to the image, you should "find" the logo, in order to focus your comparison on that and not on the background clutter.
There could be multiple logos on the same image?
The logo is always fully visible?
The logo is rigid? Or could be deformed? (think for example of a logo on a shirt or a small bottle)
Assuming that you have a single complete rigid logo to find, the simplest thing to try is template matching.
A more accurate approach is to match descriptors.
You can also see a related topic on SO here
Other more robust approaches would require to build constellations of keypoints on your reference logo, and match those constellations on the target image.
Last, but not least, have fun on Google!
I agree with #Miki , you need to do template matching, my recomendation to you is to use sum of square differences and only use a rigid transformation, you can find a lot of information here. The last is one of the best books that I've red is simple to understand and it have the major part of the equations step by step.
Currently i am working on opencv. I have an image with a text. And i want to find out the style(Bold, Italic) of the text. How can i achieve this? Thanks
What you can do is (assuming a letter by letter approach)
Using segmentation techniques you first segment out the letters
Using the segmented letters,compare against your owns data set of pre-segmented/pre-filtered letters to find the font style.
Comparison can be done using various features, SIFT,SURF,BRISK,Harris corners, template matching, or your come up with something of your own. My best guess would be to go with HAAR-features and training.
Once you get a set of features for a letter, matching for closest candidate against your pre-filtered dataset can be achieved using different techniques such as KNN, euclidean distance, etc If you use HAAR features, OpenCV can help alot in retrieval.
Eventually you might ending doing some OCR which includes font style.
OpenCV has a set of built in feature descriptors which you can read here
Good Luck!
This might help you, I know it's not exact. But it will suffice for my similar project.
"Typefont is an experimental library that detects the font of a text in a image."
https://github.com/Vasile-Peste/Typefont
Does anyone know of a c++ library for taking an image and performing image recognition on it such that it can find letters based on a given font and/or font height? Even one that doesn't let you select a font would be nice (eg: readLetters(Image image).
I've been looking into this a lot lately. Your best is simply Tesseract. If you need layout analysis on top of the OCR than go with Ocropus (which in turn uses Tesseract to do the OCR). Layout analysis refers to being able to detect position of text on the image and do things like line segmentation, block segmentation, etc.
I've found some really good tips through experimentation with Tesseract that are worth sharing. Basically I had to do a lot of preprocessing for the image.
Upsize/Downsize your input image to 300 dpi.
Remove color from the image. Grey scale is good. I actually used a dither threshold and made my input black and white.
Cut out unnecessary junk from your image.
For all three above I used netbpm (a set of image manipulation tools for unix) to get to point where I was getting pretty much 100 percent accuracy for what I needed.
If you have a highly customized font and go with tesseract alone you have to "Train" the system -- basically you have to feed a bunch of training data. This is well documented on the tesseract-ocr site. You essentially create a new "language" for your font and pass it in with the -l parameter.
The other training mechanism I found was with Ocropus using nueral net (bpnet) training. It requires a lot of input data to build a good statistical model.
In terms of invoking Tesseract/Ocropus are both C++. It won't be as simple as ReadLines(Image) but there is an API you can check out. You can also invoke via command line.
While I cannot recommend one in particular, the term you are looking for is OCR (Optical Character Recognition).
There is tesseract-ocr which is a professional library to do this.
From there web site
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available
I think what you want is Conjecture. Used to be the libgocr project. I haven't used it for a few years but it used to be very reliable if you set up a key.
The Tesseract OCR library gives pretty accurate results, its a C and C++ library.
My initial results were around 80% accurate, but applying pre-processing on the images before supplying in for OCR the results were around 95% accurate.
What is pre-preprocessing:
1) Binarize the bitmap (B&W worked better for me). How it could be done
2) Resampling your image to 300 dpi
3) Save your image in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.