not accurate tesseract OCR data from a png image in QT c++ - c++

I am using Tesseract OCR c++ library in QT to get a text from a png image
using this code
const char* lang = "eng";
QString filename = "D:/image.png";
tesseract::TessBaseAPI tess;
tess.Init(NULL, lang, tesseract::OEM_DEFAULT);
tess.SetPageSegMode(tesseract::PSM_AUTO);
FILE* fin = fopen(filename.toStdString().c_str(), "rb");
if (fin == NULL)
{
std::cout << "Cannot open " << filename.toStdString().c_str() << std::endl;
return;
}
fclose(fin);
STRING text;
if (tess.ProcessPages(filename.toStdString().c_str(), NULL, 0, &text))
{
ui->plainTextEdit->setPlainText(QString::fromUtf8(text.string()));
//show result in plainttext qt gui
}
put the data not accurate enough for the data in the table and it gives me strange characters and when I use an online OCR website to convert my image to text (the same image) it does it with 100% accurate so what makes it gives me this wrong text is this a problem with the library? or my code? or if there is a better free library I can use to be more accurate?
I got the image from pdf I use ghost script to get the image with a good quality so the OCR library should get me the correct data
link to download the image
website I use to get the accurate ocr

I am not experienced with cpp, but I think your problem relates to the below line with a great probability:
tess.Init(NULL, lang, tesseract::OEM_DEFAULT);
It must show the tessdata folder. instead of NULL you may write the folder name, for example "C:/tessdata/". Again, I am not experienced with cpp, that's why you may decide slash "/" or backslash "\". This folder should contain the language file(s).

As Eddge mentioned in his comment you should apply some image preprocessing stuff there are bunch of scripts for imagemagick.
Ans of course OpenCV will vastly help in this stuff as well.
The next point could be PSM mode which by default should satisfy your needs to extract whole page information.
Also the result of the online OCR is not 100% as you mentioned.
There is "1 S Days" instead of "15 Days"
There is "Mail: finance(a)" instead of "E Mail: finance#"
There is "TiA THE GREEN HOL1 5" instead of "T/A THE GREEN HOU 5"
etc.
Which Tesseract version are you using? I highly recommend to use 3.05. (4.0 shows much better results but it is not officially released yet).
Also the following link could help you with your results: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
P.S. I hope you are eligible to share publicly such financial documentations;)

Related

Is there a way to "prime" tesseract or other OCR engines for certain words

Is there a way to prime Tesseract-OCR or perhaps another engine to have increased sensitivity to certain words/shapes? Priming is a way that humans can increase their sensitivity towards certain stimuli, I wasn't sure if OCR does the same thing. I know apps like facebook/instagram can increase sensitivity towards certain posts or behaviors towards certain accounts if the account has exhibited that behavior in the past
Users used to be able to specify a user-words file, but the feature seems to be not working in latest versions of Tesseract, except maybe in legacy mode.
https://github.com/tesseract-ocr/tesseract/issues/960
The user-words file is a bit finicky to get working.
Here's a reduced version of the code I used to get it working
#include <tesseract/genericvector.h>
.
.
.
const char* TESSDATA = "C:/Tesseract/tessdata/";
void TryTess() {
tesseract::TessBaseAPI* api = new tesseract::TessBaseAPI();
GenericVector<STRING> pars_vec;
pars_vec.push_back("load_system_dawg");
pars_vec.push_back("load_freq_dawg");
pars_vec.push_back("load_punc_dawg");
pars_vec.push_back("load_number_dawg");
pars_vec.push_back("load_unambig_dawg");
pars_vec.push_back("load_bigram_dawg");
//pars_vec.push_back("load_fixed_length_dawgs");
pars_vec.push_back("language_model_penalty_non_dict_word");
pars_vec.push_back("user_words_suffix");
pars_vec.push_back("user_patterns_suffix");
GenericVector<STRING> pars_values;
pars_values.push_back("0");
pars_values.push_back("0");
pars_values.push_back("0");
pars_values.push_back("0");
pars_values.push_back("0");
pars_values.push_back("0");
//pars_values.push_back("F");
pars_values.push_back("9999999999999999");
pars_values.push_back("user-words");
pars_values.push_back("user-patterns");
api->Init(TESSDATA, "eng", OEM_DEFAULT, NULL, 0, &pars_vec, &pars_values, false);
/// Some image preprocessing to improve detection
char* out = api->GetUTF8Text();
std::cout << "Result: " << out;
api->End();
delete[] out;
}
Make sure you have your TESSDATA path configured. The best few resources I could find were Here
as well as here.
The major hangup was not knowing where that genericvector.h class was, as tesseract's Init method requires that class (there doesn't seem to be any conversion methods). Since the user-words file must be passed in prior to initialization, this is the only way I could find to do it. Even reading from a config file must be done after initialization, which prevents you from using user-words
Good luck!

Decrease in the quality of the image in flycapture

I am using flycapture sdk sample program to capture image form the flycapture.
My problem is that when i capture the image using the flycapture installed application the size of image is about 1.3 - 1.5 Mb. But when the take the same image using my program which consist of flycapture sample program. The size of the image is about 340K to 500K(max).Image format is .tiff
There is reduction in the quality of the image due to which my program is not able to get any valuable information form the image.
Using the following approach to save the image:
FlyCapture2::Camera camera;
FlyCapture2::Image image;
camera.RetrieveBuffer(&image);
ostringstream saveImage;
saveImage << "Image-" << "-" << i << ".tiff";
image.Save(saveImage.str().c_str());
And using the windows application following the approach mentioned in the link:
http://www.ptgrey.com/Content/Images/uploaded/FlyCapture2Help/flycapture/03demoprogram/saving%20images_flycap2.html
Please let me of any other details required
I am not 100% sure about this, since the documentation I found was for Java and not c++, but it is probably very similar.
You are using :
image.Save(saveImage.str().c_str());
to save your image, but are you sure it is saved as a tiff? the documentation (the java one), doesn't go deep into this, I am not sure if it is like OpenCV's imwrite that it automatically deduces the type and does it or not. So you should check that. There was one overload that you can pass the ImageFileFormat... this should be set to the TIFF one.
Another overload let's you specify the TIFF Options... in here you may tune it to have a different compression method. Notice that there is JPEG compression method... which would make something wayyy lighter but lossy... You may try with None, or the one that OpenCV uses LZW.

vtkImageData to DcmDataset

I hold a volume image in a vtkImageData and need to convert it to DcmDataset (DCMTK). I know that I need to set general DICOM tags like patient data to the data set. That's not the problem.
Especially I'm interested in putting the pixel data to DcmDataset. Does anybody know an example or can explain how to do that?
Thanks in advance
Quoting from the DCMTK FAQ:
Is there a tool that converts common graphic formats like PGM/PPM,
PNG, TIFF, JPEG or BMP to DICOM?
No, unfortunately, there is no such tool in DCMTK. Currently, you have to write your own little program for that purpose.
The following code snippet from the toolkit's documentation could be a starting point:
char uid[100];
DcmFileFormat fileformat;
DcmDataset *dataset = fileformat.getDataset();
dataset->putAndInsertString(DCM_SOPClassUID, UID_SecondaryCaptureImageStorage);
dataset->putAndInsertString(DCM_SOPInstanceUID, dcmGenerateUniqueIdentifier(uid, SITE_INSTANCE_UID_ROOT));
dataset->putAndInsertString(DCM_PatientsName, "Doe^John");
/* ... */
dataset->putAndInsertUint8Array(DCM_PixelData, pixelData, pixelLength);
OFCondition status = fileformat.saveFile("test.dcm", EXS_LittleEndianExplicit);
if (status.bad())
cerr << "Error: cannot write DICOM file (" << status.text() << ")" << endl;
The current snapshot of the DCMTK (> version 3.5.4) contains a new
command line tool "img2dcm" that allows for converting JPEG images to
certain DICOM image SOP classes.
I would perhaps look first at the source code for img2dcm (documented here) to see the general process and then post back with any specific questions. IMHO, DCMTK is very powerful but extremely difficult to understand.

Converting PDF to JPG like Photoshop quality - Commercial C++ / Delphi library

For the implementation of a Windows based page-flip application I need to be able to convert a large number of PDF pages into good quality JPG, not just thumbnails.
The aim is to achieve the best quality / file size for that, similar to Photoshops Save for Web does that.
Currently Im using Datalogics Adobe PDF Library SDK, which does not seem to be able to fullfil that task. I am thus looking for an alternative commcerical C++ or Delphi library which provides a good qualtiy / size / speed.
After doing some search here, I noticed that most posts are about GS & Imagekick, which I have also tested, but I am not satisfied with the output and the speed.
The target is to import the PDFs with 300dpi and convert them with JPG quality 50, 1500px height and an ouput size of 300-500kb.
If anyone could point out a good library for that task, I would be most greatful.
The Gnostice PDFtoolKit VCL may be a candidate. Convert to JPEG is one of the options.
I always recommend Graphics32 for all your image manipulation needs; you have several resamplers to choose. However, I don't think it can read PDF files as images. But if you can generate the big image yourself it may be a good choice.
Atalasoft DotImage (with the PDF rasterizer add-on) will do that (I work on PDF technologies there). You'd be working in C# (or another .NET) language:
ConvertToJpegs(string outfileStem, Stream pdf)
{
JpegEncoder encoder = new JpegEncoder();
encoder.Quality = 50;
int page = 1;
PdfImageSource source = new PdfImageSource(pdf);
source.Resolution = 300; // sets the rendering resolution to 200 dpi
// larger numbers means better resolution in the image, but will cost in
// terms of output file size - as resolution increases, memory used increases
// as a function of the square of the resolution, whereas compression only
// saves maybe a flat 30% of the total image size, depending on the Quality
// setting on the encoder.
while (source.HasMoreImages()) {
AtalaImage image = source.AcquireNext();
// this image will be in either 8 bit gray or 24 bit rgb depending
// on the page contents.
try {
string path = String.Format("{0}{1}.jpg", outFileStem, page++);
// if you need to resample the image, this is the place to do it
image.Save(path, encoder, null);
}
finally {
source.Release(image);
}
}
}
There is also Quick PDF Library
Have a look at DynaPDF. I know its pretty expensive but you can try the starter pack.
P.S.:before buying a product please make sure it meets your needs.

converting a binary stream into a png format

I will try to be clear ....
My project idea is as follow :
I took several compression algorithms which I implemented using C++, after that I took a text file and applied to it the compression algorithms which I implemented, then applied several encryption algorithms on the compressed files, now I am left with final step which is converting these encrypted files to any format of image ( am thinking about png since its the clearest one ).
MY QUESTION IS :
How could I transform a binary stream into a png format ?
I know the image will look rubbish.
I want the binary stream to be converted to a an png format so I can view it as an image
I am using C++, hope some one out there can help me
( my previous thread which was closed )
https://stackoverflow.com/questions/5773638/converting-a-text-file-to-any-format-of-images-png-etc-c
thanx in advance
Help19
If you really really must store your data inside a PNG, it's better to use a 3rd party library like OpenCV to do the work for you. OpenCV will let you store your data and save it on the disk as PNG or any other format that it supports.
The code to do this would look something like this:
#include <cv.h>
#include <highgui.h>
IplImage* out_image = cvCreateImage(cvSize(width, height), IPL_DEPTH_8U, bits_pr_pixel);
char* buff = new char[width * height * bpp];
// then copy your data to this buff
out_image->imageData = buff;
if (!cvSaveImage("fake_picture.png", out_image))
{
std::cout << "ERROR: Failed cvSaveImage" << std::endl;
}
cvReleaseImage(&out_image);
The code above it's just to give you an idea on how to do what you need using OpenCV.
I think you're better served with a bi-dimensional bar code instead of converting your blob of data into a png image.
One of the codes that you could use is the QR code.
To do what you have in mind (storing data in an image), you'll need a lossless image format. PNG is a good choice for this. libpng is the official PNG encoding library. It's written in C, so you should be able to easily interface it with your C++ code. The homepage I linked you to contains links to both the source code so you can compile libpng into your project as well as a manual on how to use it. A few quick notes on using libpng:
It uses setjmp and longjmp for error handling. It's a little weird if you haven't worked with C's long jump functionality before, but the manual provides a few good examples.
It uses zlib for compression, so you'll also have to compile that into your project.