Automatic Numberplate Recognition - c++

As the title suggest, i want to build an ANPR application in windows. I am using Brazilian number plates. And i am using OpenCV for this.
So far i manged to extract the letters form the numberplate. Following images show some of the numbers i have extracted.
The problem i am facing is that how to recognize those letter. I tried to use Google tesseract. But it fails to recognize them sometimes. Then i tried to train an OCR data base using OpenCV i used about 10 images for each character. but it also did not work properly.
So i am stuck here. i need this for final year project.So can anybody help me?? i would really appreciate it.
Following site does it very nicely
https://www.anpronline.net/demo.html
Thank you..

you could train an ann or multi-class svm on the letter images, like here

Check out OpenALPR (http://www.openalpr.com). It already has the problem solved.
If you need to do it yourself, you really do need to train Tesseract. It will give you the best results. 10 images per character is not enough, you need dozens or hundreds. If you can find a font that is similar to your plate characters, a good approach is to print out a sheet of paper with all of the characters used multiple times. Then take 5-10 pictures of the page with your camera. These can then be your input for training Tesseract.

Related

Improve quality of tesseract ocr result

I'm developing an OCR app for Android using JNI and a code developed under C++ using OpenCV and Tesseract. It will be used to read a badge with an alphanumeric ID from a photo taken by the app.
I developed an code which handle with the preprocess of the image, in order to obtain a "readable image" as the one below:
I wrote the following function for "reading" the image using tesseract:
char* read_text(Mat input_image)
{
tesseract::TessBaseAPI text_recognizer;
text_recognizer.Init("/usr/share/tesseract-ocr/tessdata", "eng", tesseract::OEM_TESSERACT_ONLY);
text_recognizer.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
text_recognizer.SetImage((uchar*)input_image.data, input_image.cols, input_image.rows, input_image.channels(), input_image.step1());
text_recognizer.Recognize(NULL);
return text_recognizer.GetUTF8Text();
}
The expected result is "KQ 978 A3705", but what I get is "KO 978 H375".
I did all the recommendations for improving the quality of the image from https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality. In addition, reading the docs from https://github.com/tesseract-ocr/docs, I tryed using an approximation of the images using polygons in order to get "better" features. The image I used is one like this:
With this image, I get "KO 978 A3705". The result is clearly better than the previous one, but is not fine.
I think that the processed image I pass to tesseract is fine enought to get a good result and I don't get it. I don't know what else to do, so I ask you for ideas in order to solve this problem. I need an exact result and I think I could get it with the processed image I get. Ideas please! =)
I noticed that with some dilations, the result ORC result is improved incredible well! For me, it was the solution

Where can I find completed img's pack for training opencv face recognizing system?

So...
Where can I find completed img's pack for training opencv face recognizing system?
Can anybody help?
have a look here
the att faces db was probably used a lot ( if you look at the docs. )
once you downloaded a set of images, you'll want to run the little python script to generate the needed csv file for training
if you opt for the yale db, you'll have to convert the images to png or pgm first ( opencv can't handle gif's)
but honestly, in the end you want to use a db, that consists entirely of faces you want to recognize [that is, your own db].
unlike most ml algo's it does not need explicit 'negative' images[people other than you want to recognize] here. thoose only add noise and degrade the actual recognition.
the only situation, where you'd want that is when there's only 1 person to recognize. you#d need some others there to increase 'contrast'

Some words(be trained) can not be recognised through Tesseract-OCR

I am currently using Tesseract-OCR to recognize some texts in a picture .But now I have a question.Since some words can not be recognized .I specially have trained them and it still did not work!
Should I need some extra files when train the language data like the DAWG files or etc. I
have no idea about that. Because sometimes it can recognize a few of them when the words display at some special positions and directions .
It is really confusing . Sincerely need your help. Thanks in advance!
Other info:
I am using the Simplified Chinese.(I don't know if there are any parameters that I did not set when using Chinese)
Since the picture I wanna recognize is a table. there are a few lines in it. Could you have any idea on this situation when recognizing tables to improve the accuracy.
Since I don't know if it is caused by the special shape of the words. I paste some words directly here. 上下午一二三四五
Many thanks !

3-fold cross-validation using Joaquim's SVM light

I need to do a 3-fold cross validation using Joaquim's SVM light. Cross Validation and SVM are new things to me and I don't know if I'm doing it right. What have I done so far? I converted my data in 3 files that I called fold1.txt fold2.txt fold3.txt with my features in this following model:
1 numberofthefeature:1 numberofthefeature:1 ...
And I also did a file called words.txt with my tokens where the number of the lines are my numberofthefeature. Did I do everything right?
So, now I have to do the 3-fold cross-validation, but I don't know how to do it with Joaquim's SVM light. I don't know to make the svm light learn and classify using the three files and choose which ones I'm going to use as a test and a train. Do I have to do a script or a program to do it?
Thanks to everybody
Thiago
I am gonna assume that you are doing text-mining as you are referring to Thorsten Joachims. Anyways, here is a set of tutorial videos on text classification, with x-validation:
http://vancouverdata.blogspot.ca/2010/11/text-analytics-with-rapidminer-part-5.html

converting image sequence to video

I want to make a screen capture utility, so far i am able to capture the screen in regular interval to get a numbered sequence of images and now i want to encode them to a video format preferably flv(because of good compression and web support)
....I tried the ffmpeg.exe for that reason but for some strange reason it did'nt work
on my vista ultimate...only the first picture is encoded while the rest -I dont know what happened to them.
Also I would prefer doing the encoding stuf programatically (using c/c++ library api if any for that purpose) rather than using tools as ffmpeg.exe and i am interested in encoding picture sequence to video not capturing contineouse video directly.
I searched through internet....there are lots of libraries and tutorial for converting between video formats but I did'nt find something usefull for my problem.
I am not verry proficient with video formats and sdk library, I just need a quick way to encode some pictures to video with some basic control (as time interval between two consecutive frames).
So can you help me with some pointers as to which library i should use and how(code fragment and little descriptive answer would greatly help) and please dont recomend any .NET solution I need to learn something out of this and dont want to apply some bruteforce approach to solve the problem.
Sorry for my english....and thanks in advance.
It appears that an .avi file can more or less directly be made of .jpg's:
An AVI file may carry audio/visual data inside the chunks in virtually any compression scheme, including Full Frame (Uncompressed), ..., Motion JPEG.
Also, something very similar has been discussed here before.