Having a hard time getting tesseract to recognize alphanumeric text. These are not dictionary words, and I've set load_system_dawg and load_freq_dawg to false as recommended in the docs.
Sample code showing what I'm doing:
// Compile with: g++ test.cpp $(pkg-config --libs tesseract opencv)
#include <tesseract/baseapi.h>
#include <tesseract/genericvector.h>
#include <opencv2/opencv.hpp>
int main()
{
/* Disabling the dictionaries Tesseract uses should increase recognition
* if most of your text isn't dictionary words. They can be disabled by
* setting both of the configuration variables load_system_dawg and
* load_freq_dawg to false.
* Source: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality
*/
GenericVector<STRING> key;
GenericVector<STRING> val;
key.push_back("load_system_dawg");
val.push_back("false");
key.push_back("load_freq_dawg");
val.push_back("false");
// also tried setting tosp_min_sane_kn_sp to both large and small values
#if 0
// unfortunately, this doesn't work in tesseract v4.0
// https://github.com/tesseract-ocr/tesseract/issues/751
key.push_back("tessedit_char_whitelist");
val.push_back("ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
#endif
std::unique_ptr<tesseract::TessBaseAPI> tess(new tesseract::TessBaseAPI());
tess->Init(nullptr, "eng", tesseract::OEM_DEFAULT, nullptr, 0, &key, &val, false);
tess->SetPageSegMode(tesseract::PageSegMode::PSM_SINGLE_LINE);
for (const auto filename : {"01.png", "02.png", "03.png", "04.png", "05.png", "06.png", "07.png", "08.png", "09.png", "10.png", "11.png", "12.png"})
{
cv::Mat mat = cv::imread(filename);
tess->SetImage(mat.data, mat.cols, mat.rows, mat.channels(), mat.step1());
char *text = tess->GetUTF8Text();
std::string str;
if (text != nullptr)
{
str = text;
}
delete [] text;
std::cout << "OCR results for " << filename << ": " << str << std::endl;
cv::imshow("image", mat);
cv::waitKey();
}
tess->End();
return 0;
}
The images I'm using and the results I'm seeing from tesseract:
"1K 45" (missing whitespace)
"1K"
"ZaP © 13" (top few pixels are missing so this one doesn't surprise me)
"308 8" ('B' vs '8' I understand, but the missing whitespace means we cannot interpret this text)
"23 B 18"
"" (blank!?)
"SZ2EC 5"
"SZ2EC 3"
"1201" (missing whitespace means we cannot interpret these results)
"a)"
"1 € 13"
"J4E 7"
I'm not looking for 100% recognition, but I think there are likely some configuration items in tesseract that I could be setting to get better results than what I'm currently seeing.
Related
I'm using espeak-ng to turn German-language traffic messages into speech. See this example text:
B6 Weserstraße B71 Seeborg vorübergehende Begrenzung der Breite.
B213 Wildeshauser Landstraße Delmenhorst-Deichhorst wegen Baustelle gesperrt.
The espeak method call looks like this:
unsigned int spoken_message_uuid = 0;
espeak_ERROR Speak (wstring text)
{
espeak_ERROR error = EE_OK;
unsigned int *uuid = &spoken_message_uuid;
const wchar_t *input = text.c_str ();
wcout << L"Speaking text:" << endl << input << endl;
error = espeak_Synth (input, text.length (), 0, POS_CHARACTER, 0, espeakCHARS_WCHAR | espeakENDPAUSE | espeakSSML, uuid, NULL);
return error;
}
My issue is now the following: All the German special characters (ä, ö, ü, ß) are not being spoken correctly! Instead, something like A Tilde ein Viertel appears in the spoken text, as if UTF-8 text had been treated erroneously as ASCII.
Here are the respective versions of espeak-ng and g++:
pi#autoradio:/import/valen/autoradio $ espeak-ng --version
eSpeak NG text-to-speech: 1.50 Data at: /usr/lib/arm-linux-gnueabihf/espeak-ng-data
pi#autoradio:/import/valen/autoradio $ g++ --version
g++ (Raspbian 6.5.0-1+rpi1+b1) 6.5.0 20181026
pi#autoradio:/import/valen/autoradio $ apt-cache policy espeak-ng
espeak-ng:
Installiert: 1.50+dfsg-7~bpo10+1
Installationskandidat: 1.50+dfsg-7~bpo10+1
Versionstabelle:
*** 1.50+dfsg-7~bpo10+1 100
100 http://deb.debian.org/debian buster-backports/main armhf Packages
100 /var/lib/dpkg/status
1.49.2+dfsg-8 500
500 http://raspbian.raspberrypi.org/raspbian buster/main armhf Packages
espeak has been installed from Debian's buster-backports repo to replace version 1.49, which didn't work either. The voice I'm using is mb-de5.
OK, this is not exactly a solution, yet a mere workaround, but at least it works: I hand over a string instead of a wstring. The original string turned out to be UTF-8-encoded, so that all the special characters fit into a string resp. char* variable. Here is the adapted code:
unsigned int spoken_message_uuid = 0;
espeak_ERROR Speak (string text)
{
espeak_ERROR error = EE_OK;
unsigned int *uuid = &spoken_message_uuid;
const char *input = text.c_str ();
cout << "Speaking text:" << endl << input << endl;
error = espeak_Synth (input, text.length (), 0, POS_CHARACTER, 0, espeakCHARS_UTF8 | espeakENDPAUSE | espeakSSML, uuid, NULL);
return error;
}
iam using tesseract ocr for reading german png images in c++ and i got problems with some special characters like
ß ä ö ü and so on.
Do i need to train tesseract for reading this correct or what need to be done?
This is the part of the original image read by tesseract
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
UPDATE
SetConsoleOutputCP(1252);//changed to german.
SetConsoleCP(1252);//changed to german
wcout << "ÄÖÜ?ß" << endl;
// Open input image with leptonica library
Pix *image = pixRead("D:\\Images\\Document.png");
api->Init("D:\\TesseractBeispiele\\Tessaractbeispiel\\Tessaractbeispiel\\tessdata", "deu");
api->SetImage(image);
api->SetVariable("save_blob_choices", "T");
api->SetRectangle(1000, 3000, 9000, 9000);
api->Recognize(NULL);
// Get OCR result
wcout << api->GetUTF8Text());
After changing the Code below the Update
the hard coded umlauts will be shown correctly, but the text from the image issnt correct, what do i need to change?
tesseract version is 3.0.2
leptonica version is 1.68
Tesseract can recognize Unicode characters. Your console may have not been configured to display them.
What encoding/code page is cmd.exe using?
Unicode characters in Windows command line - how?
i don't how to detect German the word from the image in windows environment. but i know how to detect German word to Linux environment. following code may get you some idea.
/*
* word_OCR.cpp
*
* Created on: Jun 23, 2016
* Author: root
*/
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
#include <iostream>
using namespace std;
int main(int argc ,char **argv)
{
Pix *image = pixRead(argv[1]);
if (image == 0) {
cout << "Cannot load input file!\n";
}
tesseract::TessBaseAPI tess;
// insted of the passing "eng" pass "deu".
if (tess.Init("/usr/share/tesseract/tessdata", "deu")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
tess.SetImage(image);
tess.Recognize(0);
tesseract::ResultIterator *ri = tess.GetIterator();
tesseract::PageIteratorLevel level = tesseract::RIL_WORD;
if(ri!=0)
{
do {
const char *word = ri->GetUTF8Text(level);
cout << word << endl;
delete []word;
} while (ri->Next(level));
delete []ri;
}
}
one thing you have to take care that pass good resolution image then and then it works fine.
I'm trying to store a string with special chars::
qDebug() << "ÑABCgÓ";
Outputs: (here i can't even type the correct output some garbage is missing after à & Ã)
ÃABCgÃ
I suspect some UTF-8 / Latin1 / ASCII, but can't find the setting to output to console / file. What i have written in my code : "ÑABCgÓ".
(Qt:4.8.5 / Ubunto 12.04 / C++98)
You could use the QString QString::fromUtf8(const char * str, int size = -1) [static] as the sample code presents that below. This is one of the main reasons why QString exists.
See the documentation for details:
http://qt-project.org/doc/qt-5.1/qtcore/qstring.html#fromUtf8
main.cpp
#include <QString>
#include <QDebug>
int main()
{
qDebug() << QString::fromUtf8("ÑABCgÓ");
return 0;
}
Building (customize for your scenario)
g++ -fPIC -I/usr/include/qt -I/usr/include/qt/QtCore -lQt5Core main1000.cpp && ./a.out
Output
"ÑABCgÓ"
That being said, depending on your locale, simply qDebug() << "ÑABCgÓ"; could work as well like in here, but it is recommended to make sure by explicitly asking the UTF-8 handling.
Try this:
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
QTextCodec::setCodecForCStrings(codec);
qDebug() << "ÑABCgÓ";
This question is similar to cuModuleLoadDataEx options but I would like to bring the topic up again and in addition provide more information.
When loading a PTX string with the NV driver via cuModuleLoadDataEx it seems to ignore all options all together. I provide full working examples so that anyone interested can directly and with no effort reproduce this. First a small PTX kernel (save this as small.ptx) then the C++ program that loads the PTX kernel.
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
main.cc
#include<cstdlib>
#include<iostream>
#include<fstream>
#include<sstream>
#include<string>
#include<map>
#include "cuda.h"
int main(int argc,char *argv[])
{
CUdevice cuDevice;
CUcontext cuContext;
CUfunction func;
CUresult ret;
CUmodule cuModule;
cuInit(0);
std::cout << "trying to get device 0\n";
ret = cuDeviceGet(&cuDevice, 0);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "trying to create a context\n";
ret = cuCtxCreate(&cuContext, 0, cuDevice);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "loading PTX string from file " << argv[1] << "\n";
std::ifstream ptxfile( argv[1] );
std::stringstream buffer;
buffer << ptxfile.rdbuf();
ptxfile.close();
std::string ptx_kernel = buffer.str();
std::cout << "Loading PTX kernel with driver\n" << ptx_kernel;
const unsigned int jitNumOptions = 3;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)&jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
jitOptVals[2] = &jitTime;
ret = cuModuleLoadDataEx( &cuModule , ptx_kernel.c_str() , jitNumOptions, jitOptions, (void **)jitOptVals );
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "walltime: " << jitTime << "\n";
std::cout << std::string(jitLogBuffer) << "\n";
}
Build (assuming CUDA is installed under /usr/local/cuda, I use CUDA 5.0):
g++ -I/usr/local/cuda/include -L/usr/local/cuda/lib64/ main.cc -o main -lcuda
If someone is able to extract any sensible information from the compilation process that would be great! The documentation of CUDA driver API where cuModuleLoadDataEx is explained (and which options it is supposed to accept) http://docs.nvidia.com/cuda/cuda-driver-api/index.html
If I run this, the log is empty and jitTime wasn't even touched by the NV driver:
./main small.ptx
trying to get device 0
trying to create a context
loading PTX string from file empty.ptx
Loading PTX kernel with driver
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
walltime: -2
EDIT:
I managed to get the JIT compile time. However it seems that the driver expects an array of 32bit values as OptVals. Not as stated in the manual as an array of pointers (void *) which are on my system 64 bits. So, this works:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
int *jitOptVals = new int[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
std::cout << "walltime: " << (float)jitOptions[0] << "\n";
I believe that it is not possible to do the same with an array of void *. The following code does not work:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
// here I also would have a problem casting a 64 bit void * to a float (32 bit)
EDIT
Looking at the JIT compilation time jitOptVals[0] was misleading. As mentioned in the comments, the JIT compiler caches previous translations and won't update the JIT compile time if it finds a cached compilation. Since I was looking whether this value has changed or not I assumed that the call ignores the options all together. Which it doesn't. It's works fine.
Your jitOptVals should not contain pointers to your values, instead cast the values to void*:
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
//Keep jitOptVals[2] empty as it only an Output value:
//jitOptVals[2] = (void*)jitTime;
and after cuModuleLoadDataEx, you get your jitTime like jitTime = (float)jitOptions[2];
I'm using boost::program_options to get parameters from a config file.
i understand that i can create a file by hand and program options will parse it. but i'm looking for a way for the program to generate the file automatically. meaning printing out the name of the option and it's value. for example:
>./main
without option would generate init.cfg that looks like this
[wave packet]
width = 1
position = 2.0
[calculation parameters]
levels = 15
then i would go into that file change the values using text editor and use this file:
>./main init.cfg
a nice way to approach this would be to have variables_map to have operator<<. this way i can just write it to file. change the values. read the file. all in the same format and no need to write each line.
i couldn't find anything like that in documentation or examples. please let me know if this is possible
EDIT: Sam Miller showed how to parse the ini file in sections. However, I still have a problem getting the values from boost::program_options::variables_map vm.
i tried the following
for(po::variables_map::iterator it = vm.begin(); it != vm.end(); ++it)
{
if(it->first!="help"&&it->first!="config")
cout << "first - " << it->first << ", second - " << it->second.value() << "\n";
}
instead of it->second.value(), got an error. i also tried it->second. i also got an error:
icpc -lboost_serialization -lboost_program_options -c programOptions.cc
programOptions.cc(60): error: no operator "<<" matches these operands
operand types are: std::basic_ostream<char, std::char_traits<char>> << boost::any
cout << "first - " << it->first << ", second - " << it->second.value() << "\n";
^
compilation aborted for programOptions.cc (code 2)
make: *** [programOptions.o] Error 2
i get the value correctly if i use it->second.as<int>() but not all of my values are ints and once i reach double, the program crashes with this:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_any_cast> >'
what(): boost::bad_any_cast: failed conversion using boost::any_cast
There's not a way using program_options that I'm aware of. You could use the property tree library to write the ini file.
Here is a short example:
macmini:stackoverflow samm$ cat property.cc
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/ini_parser.hpp>
#include <iostream>
int
main()
{
using boost::property_tree::ptree;
ptree root;
ptree wave_packet;
wave_packet.put( "width", "1" );
wave_packet.put( "position", "2.0" );
ptree calculation_parameters;
calculation_parameters.put( "levels", "15" );
root.push_front(
ptree::value_type( "calculation parameters", calculation_parameters )
);
root.push_front(
ptree::value_type( "wave packet", wave_packet )
);
write_ini( std::cout, root );
return 0;
}
macmini:stackoverflow samm$ g++ property.cc
macmini:stackoverflow samm$ ./a.out
[wave packet]
width=1
position=2.0
[calculation parameters]
levels=15
macmini:stackoverflow samm$
As far as I understand the question, it is about how to write config file based on given option_description.
Here is the possible, solution, how to write one options_description to config file. It relates on the fact that every parameter has some default value.
void SaveDefaultConfig()
{
boost::filesystem::ofstream configFile(configFilePath_);
auto descOptions = algorithmsDesc_.options();
boost::property_tree::ptree tree;
for (auto& option : descOptions)
{
std::string name = option->long_name();
boost::any defaultValue;
option->semantic()->apply_default(defaultValue);
if (defaultValue.type() == typeid(std::string))
{
std::string val = boost::any_cast<std::string>(defaultValue);
tree.put(name, val);
}
///Add here additional else.. type() == typeid() if neccesary
}
//or write_ini
boost::property_tree::write_json(configFile, tree);
}
Here algorithmsDesc is boost::program_options::options_description, that is where you describe options like:
algorithmsDesc_.add_options()
("general.blur_Width", po::value<int>(&varWhereToStoreValue)->default_value(3), "Gaussian blur aperture width")
The problem is if you need sections in config file. options_description doesn't have method to get caption passed through it's constructor. The dirty way to get it is to cut it from output stream made by print():
std::string getSectionName()
{
std::stringstream ss;
algorithmDesc_.print(ss)
std::string caption;
std::getline(ss,caption)
//cut last ':'
return caption.substr(0, caption.size() - 1)
}
Combining them together is straightforward.