C++ Library for image recognition: images containing words to string - c++

Does anyone know of a c++ library for taking an image and performing image recognition on it such that it can find letters based on a given font and/or font height? Even one that doesn't let you select a font would be nice (eg: readLetters(Image image).

I've been looking into this a lot lately. Your best is simply Tesseract. If you need layout analysis on top of the OCR than go with Ocropus (which in turn uses Tesseract to do the OCR). Layout analysis refers to being able to detect position of text on the image and do things like line segmentation, block segmentation, etc.
I've found some really good tips through experimentation with Tesseract that are worth sharing. Basically I had to do a lot of preprocessing for the image.
Upsize/Downsize your input image to 300 dpi.
Remove color from the image. Grey scale is good. I actually used a dither threshold and made my input black and white.
Cut out unnecessary junk from your image.
For all three above I used netbpm (a set of image manipulation tools for unix) to get to point where I was getting pretty much 100 percent accuracy for what I needed.
If you have a highly customized font and go with tesseract alone you have to "Train" the system -- basically you have to feed a bunch of training data. This is well documented on the tesseract-ocr site. You essentially create a new "language" for your font and pass it in with the -l parameter.
The other training mechanism I found was with Ocropus using nueral net (bpnet) training. It requires a lot of input data to build a good statistical model.
In terms of invoking Tesseract/Ocropus are both C++. It won't be as simple as ReadLines(Image) but there is an API you can check out. You can also invoke via command line.

While I cannot recommend one in particular, the term you are looking for is OCR (Optical Character Recognition).

There is tesseract-ocr which is a professional library to do this.
From there web site
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available

I think what you want is Conjecture. Used to be the libgocr project. I haven't used it for a few years but it used to be very reliable if you set up a key.

The Tesseract OCR library gives pretty accurate results, its a C and C++ library.
My initial results were around 80% accurate, but applying pre-processing on the images before supplying in for OCR the results were around 95% accurate.
What is pre-preprocessing:
1) Binarize the bitmap (B&W worked better for me). How it could be done
2) Resampling your image to 300 dpi
3) Save your image in a lossless format, such as LZW TIFF or CCITT Group 4 TIFF.

Related

Restrict Preprocessing of Tesseract

I am new to tesseract library and I set it up on Ubuntu 12.04.
I am using this data set to be recognized. When I was feeding these images to tesseract as it is (without any preprocessing) using this code I was getting 70-75% approx. accuracy.
I want accuracy to be 90+% so I did some preprocessing steps I followed to enhance the image are
Steps for Preprocessing
Applied bottom hat operator with structured element of circle of radius 12
Complement of image to make background white and text as black
Enhance the contrast of resultant image
Erode the image.
after these steps I get pretty clear images can be seen here. But now when I feed these images to tessearct using that same code accuracy get reduced to < 50% I dont know why? Is it because of tesseract do some preprocessing as well? if yes then how can I restrict tesseract from doing that preprocessing. If not then why it is giving me bad results when image is pretty clear now? Pardon me if I have asked some basic question.
Regarding your question why tesseract delivers better results when using a binary image instead of a gray image as input for tesseract:
Tesseract will do an internal binarization of the gray scale image with various methods (haven't figured out right know what method for binarization is used exactly, some times local adaptiv threshold, some times global OTSU threshold is mentioned in the internet). Sure is, that tesseract performs character recognition on a binary image and that the preprocessing of tesseract can still fail at specific problems (hasn't got good layout analyzes for example). So if you do the preprocessing part yourself and give tesseract as input image only a binary image with text and disable all layout analyzes in tesseract you could achieve better results than letting tesseract doing all for you. Since it is an open source free utility, it has some known drawbacks, which has to be accepted.
If you use tesseract as command line tool, this thread is very useful for the parameter. tesseract command line page segmantation
If you use the source code of tesseract in developing your own C++ Code, you have to initialze tesseract with the correct parameter. Parameter are described here at the tesseract API side. tesseract API
Well I was feeding grayscale(8bpp) image to tesseract after preprocessing so after getting that grayscale image tesseract is trying to binarize i.e. convert it to black and white, that was giving me bad results, I still don't know why.
But after that I tried to first convert my scale image in to b/w or 1bpp image and then I fed that image to tesseract I got relatively much better results.

Improving tesseract performance for specific tasks

I have already read the answers to this question.
I have a series of images that contain a single word between 3-10 characters. They are images created on a computer itself, so quality of the images is consistent and the images don't have any noise on them. The fonts are quite large (about 30 pixels in height). This should already be easy enough for tesseract to read accurately but what are some techniques I can use for improving the speed, even if it's only an improvement of a few milliseconds?
The character set consists of uppercase letters only. As the OCR task in this case is very specific, Would it help if I train the tesseract engine with this specific font and font-size or is that overkill?
Edited to include sample
Other than tesseract, are there any other solutions that I can use with C/C++ that can provide better performance? Could it be done faster with OpenCV? Compatibility with Linux is preferred.
Sample
If all the letters have same size & style, you can try something really simple like running blob detection followed by template matching of individual letters. I am not sure how will it compare with tesseract but it is a very simple experiment. (Additionaly, lowering the resolution will speed things up...)
You can also have a look at this question: Simple Digit Recognition OCR in OpenCV-Python, it may be relevant

Extracting part of a scanned document (personal ID) - which library and method to choose?

I have to process a lot of scanned IDs and I need to extract photos from them for further processing.
Here's a fictional example:
The problem is that the scans are not perfectly aligned (rotated up to 10 degrees). So I need to find their position, rotate them and cut out the photo. This turned out to be a lot harder than I originally thought.
I checked OpenCV and the only thing I found was rectangle detection but it didn't give me good results: the rectangle not always matches good enough on samples. Also its image matching algorithm works only for not-rotated image since it's just a brute force comparison.
So I though about using ARToolkit (augmented reality lib) because I know that it's able to very precisely locate given marker on an image. But it it seems that the markers have to be very simple, so I can't use a constant part of the document for this purpose (please correct me if I'm wrong). Also, I found it super-hard to compile it on Ubuntu 11.10.
OCR - haven't tried this one yet and before I start my research I'd be thankful for any suggestions what to look for.
I look for a C(preferable)/C++ solution. Python is an option too.
If you don't find another ideal solution, one method I ended up using for OCR preprocessing in the past was to convert the source images to PPM and use unpaper in Ubuntu. You can attempt to deskew the image based on whichever sides you specify as having clearly-defined edges, and there is an option to bypass the filters that would normally be applied to black and white text. You probably don't want those for images.
Example for images skewed no more than 15 degrees, using the bottom and right edges to detect rotation:
unpaper -n -dn bottom,right -dr 15 input.ppm output.ppm
unpaper was written in C, if the source is any help to you.

Assessing the quality of an image with respect to compression?

I have images that I am using for a computer vision task. The task is sensitive to image quality. I'd like to remove all images that are below a certain threshold, but I am unsure if there is any method/heuristic to automatically detect images that are heavily compressed via JPEG. Anyone have an idea?
Image Quality Assessment is a rapidly developing research field. As you don't mention being able to access the original (uncompressed) images, you are interested in no reference image quality assessment. This is actually a pretty hard problem, but here are some points to get you started:
Since you mention JPEG, there are two major degradation features that manifest themselves in JPEG-compressed images: blocking and blurring
No-reference image quality assessment metrics typically look for those two features
Blocking is fairly easy to pick up, as it appears only on macroblock boundaries. Macroblocks are a fixed size -- 8x8 or 16x16 depending on what the image was encoded with
Blurring is a bit more difficult. It occurs because higher frequencies in the image have been attenuated (removed). You can break up the image into blocks, DCT (Discrete Cosine Transform) each block and look at the high-frequency components of the DCT result. If the high-frequency components are lacking for a majority of blocks, then you are probably looking at a blurry image
Another approach to blur detection is to measure the average width of edges of the image. Perform Sobel edge detection on the image and then measure the distance between local minima/maxima on each side of the edge. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous approach. "No Reference Block Based Blur Detection" by Debing is a more recent paper
Regardless of what metric you use, think about how you will deal with false positives/negatives. As opposed to simple thresholding, I'd use the metric result to sort the images and then snip the end of the list that looks like it contains only blurry images.
Your task will be a lot simpler if your image set contains fairly similar content (e.g. faces only). This is because the image quality assessment metrics
can often be influenced by image content, unfortunately.
Google Scholar is truly your friend here. I wish I could give you a concrete solution, but I don't have one yet -- if I did, I'd be a very successful Masters student.
UPDATE:
Just thought of another idea: for each image, re-compress the image with JPEG and examine the change in file size before and after re-compression. If the file size after re-compression is significantly smaller than before, then it's likely the image is not heavily compressed, because it had some significant detail that was removed by re-compression. Otherwise (very little difference or file size after re-compression is greater) it is likely that the image was heavily compressed.
The use of the quality setting during re-compression will allow you to determine what exactly heavily compressed means.
If you're on Linux, this shouldn't be too hard to implement using bash and imageMagick's convert utility.
You can try other variations of this approach:
Instead of JPEG compression, try another form of degradation, such as Gaussian blurring
Instead of merely comparing file-sizes, try a full reference metric such as SSIM -- there's an OpenCV implementation freely available. Other implementations (e.g. Matlab, C#) also exist, so look around.
Let me know how you go.
I had many photos shot to an ancient book (so similar layout, two pages per image), but some were much blurred, to the point that the text could not be read. I searched for a ready-made batch script to find the most blurred one, but I didn't find any useful, so I used another part of script got on the net (based on ImageMagick, but no longer working; I couldn't retrieve the author for the credits!), useful to assessing the blur level of a single image, tweaked it, and automatised it over a whole folder. I uploaded here:
https://gist.github.com/888239
hoping it will be useful for someone else. It works on a Linux system, and uses ImageMagick (and some usually command line installed tools, as gawk, sort, grep, etc.).
One simple heuristic could be to look at width * height * color depth < sigma * file size. You would have to determine a good value for sigma, of course. sigma would be dependent on the expected entropy of the images you are looking at.

What is the difference between ImageMagick and GraphicsMagick?

I've found myself evaluating both of these libs. Apart from what the GraphicsMagick comparison says, I see that ImageMagick still got updates and it seems that the two are almost identical.
I'm just looking to do basic image manipulation in C++ (i.e. image load, filters, display); are there any differences I should be aware of when choosing between these libraries?
As with many things in life, different people have different ideas about what is best. If you ask a landscape photographer who wanders around in the rain in Scotland's mountains which is the best camera in the world, he's going to tell you a light-weight, weather-sealed camera. Ask a studio photographer, and he'll tell you the highest resolution one with the best flash sync speed. And if you ask a sports photographer he'll tell you the one with the fastest autofocus and highest frame rate. So it is with ImageMagick and GraphicsMagick.
Having answered around 2,000 StackOverflow questions on ImageMagick over the last 5+ years, I make the following observations...
In terms of popularity...
ImageMagick questions on SO outnumber GraphicsMagick questions by a factor of 12:1 (7,375 questions vs 611 at May 2019), and
ImageMagick followers on SO outnumber GraphicsMagick followers by 15:1 ((387 followers versus 25 at May 2019)
In terms of performance...
I am happy to concede that GraphicsMagick may be faster for some, but not all problems. However, if speed is your most important consideration, I think you should probably be using either libvips, or parallel code on today's multi-core CPUs or heavily SIMD-optimised (or GPU-optimised) libraries like OpenCV.
In terms of features and flexibility...
There is one very clear winner here - ImageMagick. My experience is that there are many features missing from GraphicsMagick which are present in ImageMagick and I list some of these below, in no particular order.
I freely admit I am not as familiar with GraphicsMagick as I am with ImageMagick, but I made my very best effort to find any mention of the features in the most recent GraphicsMagick source code. So, for Canny Edge Detector, I ran the following command on the GM source code:
find . -type f -exec grep -i Canny {} \;
and found nothing.
Canny Edge detector
This appears to be completely missing in GM. See -canny radiusxsigma{+lower-percent}{+upper-percent} in IM.
See example here and sample of edge-detection on Lena image:
Parenthesised processing, sophisticated re-sequencing
This is a killer feature of ImageMagick that I frequently sorely miss when having to use GM. IM can load, or create, or clone a whole series of images and apply different processing selectively to specific images and re-sequence, duplicate and re-order them very simply and conveniently. It is hard to convey the incredible flexibility this affords you in a short answer.
Imagine you want to do something fairly simple like load image A and blur it, load image B and make it greyscale and then place the images side-by-side with Image B on the left. That looks like this with ImageMagick:
magick imageA.png -blur x3 \( imageB.png -colorspace gray \) +swap +append result.png
You can't even get started with GM, it will complain about the parentheses. If you remove them, it will complain about swapping the image order. If you remove that it will apply the greyscale conversion to both images because it doesn't understand parentheses and place imageA on the left.
See the following sequencing commands in IM:
-swap
-clone
-duplicate
-delete
-insert
-reverse
fx DIY Image Processing Operator
IM has the -fx operator which allows you to create and experiment with incredibly sophisticated image processing. You can have function evaluated for every single pixel in an image. The function can be as complicated as you like (save it in a file if you want to) and use all mathematical operations, ternary-style if statements, references to pixels even in other images and their brightness or saturation and so on.
Here are a couple of examples:
magick rose: -channel G -fx 'sin(pi*i/w)' -separate fx_sine_gradient.gif
magick -size 80x80 xc: -channel G -fx 'sin((i-w/2)*(j-h/2)/w)/2+.5' -separate fx_2d_gradient.gif
A StackOverflow answer that uses this feature to great effect in processing green-screen (chroma-keyed) images is here.
Fourier (frequency domain) Analysis
There appears to be no mention of forward or reverse Fourier Analysis in GM, nor the High Dynamic Range support (see later) that is typically required to support it. See -fft in IM.
Connected Component Analysis / Labelling/ Blob Analysis
There appears to be no "Connected Component Analysis" in GM - also known as "labelling" and "Blob Analysis". See -connected-components connectivity for 4- and 8-connected blob analysis.
This feature alone has provided 60+ answers - see here.
Hough Line Detection
There appears to be no Hough Line Detection in GM. See -hough-lines widthxheight{+threshold} in IM.
See description of the feature here and following example of detected lines:
Moments and Perceptual Hash (pHash)
There appears to be no support for image moments calculation (centroids and higher orders), nor Perceptual Hashing in GM. See -moments in IM.
Morphology
There appears to be no support for Morphological processing in GM. In IM there is sophisticated support for:
dilation
erosion
morphological opening and closing
skeletonisation
distance morphology
top hat and bottom hat morphology
Hit and Miss morphology - line ends, line junctions, peaks, ridges, Convex Hulls etc
See all the sophisticated processing you can do with this great tutorial.
Contrast Limited Adaptive Histogram Equalisation - CLAHE
There appears to be no support for Contrast Limited Adaptive Histogram Equalisation in GM. See -clahe widthxheight{%}{+}number-bins{+}clip-limit{!} in IM.
HDRI - High Dynamic Range Imaging
There appears to be no support for High Dynamic Range Imaging in GM - just 8, 16, and 32-bit integer types.
Convolution
ImageMagick supports many types of convolution:
Difference of Gaussians DoG
Laplacian
Sobel
Compass
Prewitt
Roberts
Frei-Chen
None of these are mentioned in the GM source code.
Magick Persistent Register (MPR)
This is an invaluable feature present in ImageMagick that allows you to write intermediate processing results to named chunks of memory during processing without the overhead of writing to disk. For example, you can prepare a texture or pattern and then tile it over an image, or prepare a mask and then alter it and apply it later in the same processing without going to disk.
Here's an example:
magick tree.gif -flip -write mpr:tree +delete -size 64x64 tile:mpr:tree mpr_tile.gif
Broader Colourspace Support
IM supports the following colourspaces not found in GM:
CIELab
HCL
HSI
LMS
others.
Pango Support
IM supports Pango Text Markup Language which is similar to HTML and allows you to annotate images with text that changes:
font, colour, size, weight, italics
subscript, superscript, strike-through
justification
mid-sentence and much, much more. There is a great example here.
Shrink-on-load with JPEG
This invaluable feature allows the library to shrink JPEG images as they are read from disk, so that only the necessary coefficients are read, so the I/O is lessened, and the memory consumption is minimised. It can massively improve performance when down-scaling images.
See example here.
Defined maximum JPEG size when writing
IM supports the much-requested option to specify a maximum filesize when writing JPEG files, -define jpeg:extent=400KB for example.
Polar coordinate transforms
IM supports conversion between cartesian and polar coordinates, see -distort polar and -distort depolar.
Statistics and operations on customisable areas
With its -statistic MxN operator, ImageMagick can generate many useful kinds of statistics and effects. For example, you can set each pixel in an image to the gradient (difference between brightest and darkest) of its 5x3 neighbourhood:
magick image.png -statistic gradient 5x3 result.png
Or you can set each pixel to the median of its 1x200 neighbourhood:
magick image.png -statistic median 1x200 result.png
See example of application here.
Sequences of images
ImageMagick supports sequences of images, so if you have a set of very noisy images shot at high ISO, you can load up the entire sequence of images and, for example, take the median or average of all images to reduce noise. See the -evaluate-sequence operator. I do not mean the median in a surrounding neighbourhood in a single image, I mean by finding the median of all images at each pixel position.
The above is not an exhaustive list by any means, they are just the first few things that came to mind when I thought about the differences. I didn't even mention support for HEIC (Apple's format for iPhone images), increasingly common High Dynamic Range formats such as EXR, or any others. In fact, if you compare the file formats supported by the two products (gm convert -list format and magick identify -list format) you will find that IM supports 261 formats and GM supports 192.
As I said, different people have different opinions. Choose the one that you like and enjoy using it.
As always, I am indebted to Anthony Thyssen for his excellent insights and discourse on ImageMagick at https://www.imagemagick.org/Usage/ Thanks also to Fred Weinhaus for his examples.
From what I have read GraphicsMagick is more stable and is faster.
I did a couple of unscientific tests and found gm to be twice as fast as im (doing a resize).
I found ImageMagick to be incredibly slow for processing TIFF group-4 images (B&W document images), mainly due to the fact that it converts from 1-bit-per-pixel to 8 and back again to do any image manipulation. The GraphicsMagick group overhauled the TIFF format support with their version 1.2, and it is much faster at processing these types of images than the original ImageMagick was. The current GraphicsMagick stable release is at 1.3.5.
I use ImageMagick when speed isn't a factor. However on the server side, where tens of thousands of images are being processed daily, GraphicsMagick is quite noticeably faster - in some cases up to 50% faster in benchmarks!
History
graphicsmagick was forked from imagemagick back in 2002 due to disputes between founding developers. thus they share the same codebase.
Ref : https://en.wikipedia.org/wiki/GraphicsMagick
Goal
graphicsmagick
focuses on simple, stable, and clearer codebase / architecture
imagemagick
focuses on rolling out new features, extend a wider toolbase
Other than speed, imagemagick adds a number of cli tools to terminal shell whereas graphicsmagick is a single tool which you can call.
CLI interface design
graphicsmagick
gm <command> <options> <file>
imagemagick
convert <options> <file>
compare <options> <file>
imho, i prefer (in fact, only use) graphicsmagick(gm) over imagemagick as the latter has higher chance of tool name clash, which causes lots of issues in finding out why certain tools are not running, especially during server side automation tasks. in summary graphicsmagick has much clearer design.
imagine a binary called convert in a project and is it imagemagick's convert or your own rolled tool in project that will be called?
list of imagemagick tools (including convert, compare, display) : https://imagemagick.org/script/command-line-tools.php
list of graphicsmagick commands :
http://www.graphicsmagick.org/utilities.html
note : as of v7 as mentioned by Mark S, imagemagick is now distributed as single binary, and also supporting older v6 commands.
Performance
a simple memory consumption test can be found here :
https://coderwall.com/p/1l7h-a/imagemagick-bloat-graphicsmagick
Dependancies
GraphicsMagick depends on 36 libraries whereas ImageMagick requires 64. Ref : http://www.graphicsmagick.org/1.3/FAQ.html
Note that GraphicsMagick provides API and ABI stability, which isn't part of the guarantee for ImageMagick. This would be important in the long run unless you are vendoring all your dependencies.
GraphicsMagick was an early fork from Imagemagick. You can read about Imagemagick's history and the fork to GraphicsMagick at https://imagemagick.org/script/history.php. It seems that Imagemagick has continued to be developed rather extensively, while GraphicsMagick has remained more or less stagnant since the fork.