tesseract-ocr how to includ baseapi.h - c++

I followed the instructions I found in form of tessesract on how to includ baseapi.h.
i am using:
vs2010
Version tesseract 3.01
i try to understand how to use baseapi.h.
test program:
#define __MSW32__
#include "baseapi.h"
using namespace tesseract;
int _tmain(int argc, _TCHAR* argv[])
{
TessBaseAPI *myTestApi;
myTestApi=new TessBaseAPI();
//myTestApi->Init("d:/temp.jpg","eng");
return 0;
}
form gide:
add the following folders to Additional Include Directories (properties) - to resolve file not found issues after including "baseapi.h"
tesseract-3.01/api
tesseract-3.01/ccmain
tesseract-3.01/ccutil
tesseract-3.01/ccstruct
added the following libs to "Properties/Linker/Input/Additional Dependancies" in order to use the Tesseract and Leptonica libs libtesseract.lib;liblept.lib
// added the following paths to "Properties/Linker/General/Additional Library Directories" in order to find the Tesseract and Leptonica libs
tesseract-3.01/vs2010/Release
tesseract-3.01/vs2008/lib
And I try to run now
So I try to find libs libtesseract.lib and replaced with libtesseract_tessopt.lib and then run
1>------ Build started: Project: test4, Configuration: Debug Win32 ------
1> test4.cpp
1>test4.obj : error LNK2019: unresolved external symbol "public: __thiscall tesseract::TessBaseAPI::TessBaseAPI(void)" (??0TessBaseAPI#tesseract##QAE#XZ) referenced in function _wmain
1>c:\users\eran0708\documents\visual studio 2010\Projects\test4\Debug\test4.exe : fatal error LNK1120: 1 unresolved externals
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Is anything known solution to the problem?
thanks,
eran
![enter image description here][6]
![enter image description here][7]

This is what I did to compile it:
1.) Copy all header files into one includedirectory, so later only §(TESS_DIR)\include has to be added to the include directories.
copy the leptonica headers into $(TESS_DIR)\include\leptonica.
2.) Open vs2010\tesseract.sln and compile all configurations. Then copy all lib files into $(TESS_DIR)\lib\debug and $(TESS_DIR)\lib\release. Then add those directories to the build settings.
3.) Copy the compiled libtesseract.dll and liblept168.dll as well as the folder tessdata, containg eng.traineddata, to the Release folder of your project.
4.) Add these libraries as additional dependencies:
libtesseract.lib
liblept168.lib
5.) #include <baseapi.h>

I figured it out, if you are using visual studios 2010 and are using windows forms / designer you can add it easily this way with no issues
1) add the following projects to your project ( i am warning you once, do not add the tesseract solution, or change any setting in the projects you add, unless you love to hate yourself )
ccmain
ccstruct
ccutil
classify
cube
cutil
dict
image
libtesseract
nutral_networks
textord
viewer
wordrec
you can add the others but you don’t really want all that built into your project do you? naaa, build those separately
2) go to your project properties and add libtesseract as a reference, you can now that it is visible as a project, this will make it so that your project builds fast without examining the millions of warnings within tesseract. [common properties]->[add reference]
3) right click your project in the solution explorer and click project dependencies, make sure it is dependant on libtesseract or even all of them, it just means they build before your project.
4) the tesseract 2010 visual studio projects contain a number of configuration settings aka release, release.dll, debug, debug.dll, it seems that the release.dll settings produce the right files. First, set the solution output to release.dll. Click your project properties. Then click configuration manager. If that is not available, do this, click the SOLUTION's properties in the solution tree and click configuration tab, you will see a list of projects and the associated configuration settings. You will notice your project is not set to release.dll even though the output is. If you took the second route you still need to click configuration manager. Then you can edit the settings, click new on your projects settings and call it release.dll...exactly the same as the rest of them and copy the settings from release. Do the same thing for Debug, so that you have a debug.dll name copied from debug settings. wheew...almost done
5) Don’t try to change tesseracts settings to match yours....that wont work ....and when the new release comes out you wont be able to just "throw it in" and go. Accept the fact that in this state your new modes are Release.dll and Debug.dll. don’t stress out...you can go back when its is finished and remove the projects from your solution.
6) Guess where the libraries and dll’s come out? in your project, you may or may not need to add the library directories. Some people say to dump all the headers into a single folder so they only need to add one folder to the includes but not me. I want to be able to delete the tesseract folder and reload it from the zips without extra work....and be fully ready to update in one move or restore it if I made a mess of the code. Its a bit of work and you can to it with code instead of the settings which is the way i do it, but you should include all the folders that contain header files within the 2010 tesseract project folder and leave them alone.
7) there is no need to add any files to your project. just these lines of code..... I have included some additional code that converts from one foreign data set to the tiff friendly version with no need to save / load file. aren’t I nice?
8) now you can fully debug in debug.dll and release.dll, once you have successfully built it into your project even once you can remove all the added projects and it will be peeerfect. no extra compiling or errors. fully debugable, all natural.
9) If I remember right, I could not get around the fact I had to copy the files in 2008/lib/ into my projects release folder….darn it.
In my projects “functions.h” I put
#pragma comment (lib, "liblept.lib" )
#define _USE_TESSERACT_
#ifdef _USE_TESSERACT_
#pragma comment (lib, "libtesseract.lib" )
#include <baseapi.h>
#endif
#include <allheaders.h>
in my main project I put this in a class as a member:
tesseract::TessBaseAPI *readSomeNombers;
and of course I included “functions.h” somewhere
then I put this in my classes constructor:
readSomeNombers = new tesseract::TessBaseAPI();
readSomeNombers ->Init(NULL, "eng" );
readSomeNombers ->SetVariable( "tessedit_char_whitelist", "0123456789,." );
then I created this class member function: and a class member to serve as an output, don’t hate, I don’t like returning variables. Not my style. The memory for the pix does not need to be destroyed when used inside a member function this way I believe and my test suggest this is a safe way to call these functions. But by all means, you can do whatever.
void Gaara::scanTheSpot()
{
Pix *someNewPix;
char* outText;
ostringstream tempStream;
RECT tempRect;
someNewPix = pixCreate( 200 , 40 , 32 );
convertEasyBmpToPix( &scanImage, someNewPix, 87, 42 );
readSomeNombers ->SetImage(someNewPix);
outText = readSomeNombers ->GetUTF8Text();
tempStream.str("");
tempStream << outText;
classMemeberVariable = tempStream.str();
//pixWrite( "test.bmp", someNewPix, IFF_BMP );
}
The object that has the information that I want to scan is in memory and is pointed to by &scanImage. It is from the “EasyBMP” library but that is not important.
Which I deal with in a function in “functions.h”/ “functions.cpp” by the way, i am doing a little extra processing here while i am in the loop, namely thinning the characters and making it black and white and reversing black and white which is unnessesry. At this phase in my development I am still looking for ways to improve the recognition. Though for my proposes this has not yielded bad data yet. My view is to use the default Tess data for simplicity. I am acting heuristically to solve a very complex problem.
void convertEasyBmpToPix( BMP *sourceImage, PIX *outputImage, unsigned startX, unsigned startY )
{
int endX = startX + ( pixGetWidth( outputImage ) );
int endY = startY + ( pixGetHeight( outputImage ) );
unsigned destinationX;
unsigned destinationY = 0;
for( int yLoop = startY; yLoop < endY; yLoop++ )
{
destinationX = 0;
for( int xLoop = startX; xLoop < endX; xLoop++ )
{
if( isWhite( &( sourceImage->GetPixel( xLoop, yLoop ) ) ) )
{
pixSetRGBPixel( outputImage, destinationX, destinationY, 0,0,0 );
}
else
{
pixSetRGBPixel( outputImage, destinationX, destinationY, 255,255,255 );
}
destinationX++;
}
destinationY++;
}
}
bool isWhite( RGBApixel *image )
{
if(
//destination->SetPixel( x, y, source->GetPixel( xLoop, yLoop ) );
( image->Red < 50 ) ||
( image->Blue < 50 ) ||
( image->Green < 50 )
)
{
return false;
}
else
{
return true;
}
}
one thing i dont like is the way I declare the size of the pix outside the function. It seems if I try to do it within the function I have unexpected results....if the memory is alocated while inside it is destroed when I leave.
g m a i l Certainly not my most elegant work but I also gutted the hell out of it for simplicity. Why I bother to share this I don't know. I should have kept it to myself. What is my name? Kage.Sabaku.No.Gaara
before i let you go i should mention the subtle differences between my windows form app and the default settings. namely i use "multi-byte" character set. project properties...and such..give a dog a bone, maybe a vote?
p.p.s. I hate to say it but I made one change to host.c if you use 64 bit you can do the same. Otherwise your on your own.....but my reason was a bit insane you dont have to, kinda silly actually
typedef unsigned int uinT32;
#if (_MSC_VER >= 1200) //%%% vkr for VC 6.0
typedef _int64 inT64;
typedef unsigned _int64 uinT64;
#else
typedef long long int inT64;
typedef unsigned long long int uinT64;
#endif //%%% vkr for VC 6.0
typedef float FLOAT32;
typedef double FLOAT64;
typedef unsigned char BOOL8;

Related

Coapp / autopkg : multiple include folders in /build/native/include/

I am trying to build a nuget package via CoApp tool for c++.
The package needs to embed 3 folders when compiling a cpp using it.
So, I want an internal include structure as following :
/build/native/include/lib1,
/build/native/include/lib2,
/build/native/include/lib3
My question: how to add several include folders in /build/native/include/
I tryied :
Multiple blocs of (varying lib1, lib2, lib3):
nestedInclude +=
{
#destination = ${d_include}lib1;
".\lib1\**\*.hpp", ".\lib1\**\*.h"
};
Multiple blocs of (varying lib1, lib2, lib3):
nestedInclude
{
#destination = ${d_include}lib1;
".\lib1\**\*.hpp", ".\lib1\**\*.h"
};
but it seems coapp accumulates the .h/.hpp files among the blocs (depending of operator += or not) and at the end, add all of them to the last #destination tag value. So I get an unique entry : /build/native/include/lib3
The destination is overwritten in your example and therefore you get everything flat in the last given address. To handle this you can instead create multiple nested include,
nested1Include: {
#destination = ${d_include}lib1;
".\lib1\**\*.hpp", ".\lib1\**\*.h"
}
nested2Include: {
#destination = ${d_include}lib2;
".\lib2\**\*.hpp", ".\lib2\**\*.h"
}
I've just hit the same issue, and Gorgar's answer set me on the right track, thank you. But I do have one additional piece of information. I only had one underlying directory, and in that case CoApp still flattened everything. The trick is to make it think it has two, even if it doesn't, like this:
include1: {
#destination = ${d_include}NativeLogger;
"include\NativeLogger\*.h"
};
// The use of a second include spec here which doesn't actually address any files
// is to force CoApp to create the substructure of the first include. There is some
// discussion on the net about bugginess related to includes structures, but this
// seems to fix it.
include2: { include\* };

How to compile source code separately in c++

I hope someone could help me address this fundamental problem that I have been trying to tackle for the last two weeks.
I have a solution that contains 4 projects, some libraries that the project files depend on. In each of these project, a copy of logic.cpp file has been included and it contains a long list of logic which in pseudo codes looks like this:
BOOL myLogic(){
if(...)
{
switch(...)
{
case 1:
doA();
break;
case 2:
doB();
break;
...
case 20:
doSomething();
break;
}
}
}
For project #1, it generates an exe of the tool. While for project #2, it generates the dll version of the tool that I'm building and the other 2 projects, they act as utility files for my tool. If you notice there are like 20 cases that the logic can run into and it is pretty massive.
So, my problem now is that all these source codes are being compiled into my single exe or dll even when some of these case may not even be reached when deployed in some scenarios. What I want to achieve is to break this switch case and compile 20 different sets of exe and dll. So
1) The application has a smaller footprint.
2) The sources could be protected to a certain extent when reverse engineered.
Hence, I would like to seek advise from the community on how do I go about solving this problem, if I would like to still continue using Visual Studio's inbuilt compilation. (I could build the 20 sets of exe and dll with the "Build Solution").
Thank you and I appreciate any advice. Feel free to clarify if I have not been clear enough in my question.
Create a new project, that compiles into static library. In that project create separate source cpp files for all the 20 functionalities. (Splitting to more source files are just for the sake of maintainability.) Split logic.cpp into the 20 separate files. If there are common code parts, you can create more source files to contain those parts.
Now create 2x20 new projects: 20 exe projects and 20 dll projects. Each of these projects depends on the static library project created in step 1, and all of these projects are nothing but a simple stub for calling exactly one of the functionalities from the common library.
When you build the solution, you will have 20 differently named executables and 20 differently named dlls, for each functionality. If dead code elimination is turned on in the linker, then none of the exes/dlls will contain code that is not required for the specific function.
What about some handwork?
Indroduce some defines for your scenarios or use some standard ones like "_ISDLL"
and encase the cases :-) from which you know they can not be reaches in "#ifdefs"
#ifdef _ISDLL
case x:
break;
#endif

Function definitions missing from intellisense in Visual Studio C++ 2005-2013

The following problem plagues one of my projects for a long time:
Some function definitions (from .cpp files) are excluded/hidden from intellisense!
It is not possible to "Goto Definition" for those functions, nor are the listed in the Navigation Bar.
The functions do appear in the autocompletion list, though. The problem is for .cpp files only, the .h files are parsed fine. 'Goto Declaration' works, too.
This is the same since 2005, with every new version, I was hoping for a fix, but it does not seem to be regognized as a bug by anyone else.
UPDATE:
I have tracked this down to the following: All functions containing a certain macro are not recognized by intellisense. The original macro was
#define forlist(x,list) for( auto x= list.begin(); x.valid(); ++x)
but you can also use the simplified test case
#define fortest(x) for( auto x= 1; x< 2; ++x)
void myclass::TestFN()
{
fortest( g )
{
g;
}
}
Next step would be to find a workaround (or try to go through micrsoft bug reporting).
Please don't rant too much about this macro. This is existing code of a list implementation which I am not able to change. I could just NOT use the macro, but I still think this is a VS bug.
One funny thing is, that the following (really ***ic macro) works fine:
#define fortest(x) for( auto x= 1; x< 2; ++x) {
void myclass::TestFN()
{
fortest( g )
g;
}
}
Could it be that intellisense treats case 1 as an illegal local function definition?
(see http://connect.microsoft.com/VisualStudio/feedback/details/781121/c-intellisense-mistakes-loop-expression-for-function-definition)
The following work fine, too
#define fortest(x) for( auto x= 1; x< 2; ++x)
void myclass::TestFN()
{
fortest( g )
g;
}
As usual, interest in my question ebbed up after a couple of hours, so I had to figure it out by myself...
We just have to use the concept of cpp.hint files.
Basically you have to put the troublesome macros into a file named cpp.hint and put that file in your solution directory (which did not work for me)
OR in a parent-directory where your code files reside in. (worked for me)
In that file we just put the troublesome macros WITHOUT right-hand-side, so e.g.:
#define forlist(x,list)
NOTE: Your must reset IntelliSense cache for use new data from changed cpp.hint file. You should:
delete ipch folder (usually placed in Solution folder).
delete all *.sdf files in Solution folder.
delete all *.VC.db files in Solution folder or in ipch folder.
For more advanced macros (like having 'start' and 'end' macros for code blocks), there are some other tricks.
The original link is:
http://msdn.microsoft.com/en-us/library/dd997977.aspx
The reason for the trouble is that Intellisense performance would (potentially) decrease dramatically if it had to parse all macros in a project, so it only parses those given explicitly in 'cpp.hint'.

Can I programmatically collapse/expand all preprocessor blocks of a certain name in Visual Studio 2012?

My current project has a lot of debug preprocessor blocks scattered throughout the code. These are intentionally named differently to the system _DEBUG and NDEBUG macros, so I have a lot of this:
// Some code here
#ifdef PROJNAME_DEBUG
//unit tests, assumption testing, etc.
#endif
// code continues
These blocks sometimes get rather large, and their presence can sometimes inhibit code readability. In Visual Studio 2012 I can easily collapse these, but it would be nice to automatically have all of them collapsed, allowing me to expand them if I want to see what's in there. However, as I also have a bunch of header guards I don't want to collapse all preprocessor blocks, only the #ifdef PROJNAME_DEBUG ones.
Can I do this?
This is the most easiest scenario you can achive it, I think.
You should create an Add-In first in C#. (in VS 2013 they become deprecated :( )
In the OnConnection method you should add your command:
public void OnConnection( object application, ext_ConnectMode connectMode, object addInInst, ref Array custom )
{
_applicationObject = (DTE2)application;
if (connectMode == ext_ConnectMode.ext_cm_AfterStartup || connectMode == ext_ConnectMode.ext_cm_Startup)
{
Commands2 commands = (Commands2)_applicationObject.Commands;
try
{
//Add a command to the Commands collection:
Command command = commands.AddNamedCommand2(_addInInstance, "MyAddinMenuBar", "MyAddinMenuBar", "Executes the command for MyAddinMenuBar", true, 59, ref contextGUIDS, (int)vsCommandStatus.vsCommandStatusSupported + (int)vsCommandStatus.vsCommandStatusEnabled, (int)vsCommandStyle.vsCommandStylePictAndText, vsCommandControlType.vsCommandControlTypeButton);
}
catch (System.ArgumentException)
{
//If we are here, bla, bla... (Auto generated)
}
}
}
Note: you can find how parameters are act at the reference of AddNamedCommand2
The template created version would be also fine, but naturaly it worth to name your command properly.
After that you need to add your logic to Exec method:
public void Exec( string commandName, vsCommandExecOption executeOption, ref object varIn, ref object varOut, ref bool handled )
{
handled = false;
if (executeOption == vsCommandExecOption.vsCommandExecOptionDoDefault)
{
if (commandName == "MyAddinMenuBar.Connect.MyAddinMenuBar")
{
List<string> args = (varIn as string).Split(' ').ToList();
TextSelection ts;
ts = (TextSelection)_applicationObject.ActiveDocument.Selection;
EditPoint ep = (ts.ActivePoint).CreateEditPoint();
ep.StartOfDocument();
do
{
string actualLine = ep.GetLines(ep.Line, ep.Line + 1);
if (args.TrueForAll(filter => actualLine.Contains(filter)))
{
_applicationObject.ExecuteCommand("Edit.GoTo", ep.Line.ToString());
_applicationObject.ExecuteCommand("Edit.ToggleOutliningExpansion");
}
ep.LineDown();
} while (!ep.AtEndOfDocument);
handled = true;
return;
}
}
}
Note: Name you given to the command is checked in exec.
Than you can build.
Deployment of Add-In can happen through an [ProjectName].AddIn file in ..\Documents\Visaul Studio 20[XY]\AddIns\. (Created by the template, you should copy if you move the Add-In elsewhere)
You should place your Add-In assembly where the Assembly element of the mentioned file you set to point. To change version you should modify the text in Version element.
After you deployed and started Studio, you should activate the Add-In in the manager in Toolsmenu.
You need to expand all collapsable section in your code file (CTRL+M+L with C# IDE settigs).
This is required because I found only a way to invert the state of collapsion. If you find better command, you can change it.
Next you should activate Command Window to use the the created command.
Now only you need to type your commands name, like this:
MyAddinMenuBar.Connect.MyAddinMenuBar #ifdef PROJNAME_DEBUG
Hopefully magic will happen.
This solution is independent of language of code you edit so pretty multifunctional.

Including C++ headers in user mode programs built with NT DDK

So...I have a kernel mode component and a user mode component I'm putting together using the turnkey build environment of the NT DDK 7.1.0. The kernel component is all .c/.h/.rc files. The user mode component is .cpp/.c/.h/.rc files.
At first it seemed simplest to use build for both, as I saw you could modify the ./sources file of the user mode component to say something like:
TARGETNAME = MyUserModeComponent
TARGETTYPE = PROGRAM
UMTYPE = windows
UMENTRY = winmain
USE_MSVCRT = 1
That didn't seem to cause a problem and so I was pleased, until I tried to #include <string> (or <memory>, or whatever) Doesn't find that stuff:
error C1083: Cannot open include file: 'string': No such file or directory
Still, it's compiling the user mode piece with C++ language semantics. But how do I get the standard includes to work?
I don't technically need to use the DDK build tool for the user mode piece. I could make a visual studio solution. I'm a bit wary as I have bumped into other annoyances, like the fact that the DDK uses __stdcall instead of __cdecl by default... and there isn't any pragma or compiler switch to override this. You literally have to go into each declaration you care about and change it, assuming you have source to do so. :-/
I'm starting to wonder if this is just a fractal descent into "just because you CAN doesn't mean you SHOULD build user mode apps with the DDK. Here be dragons." So my question isn't just about this particular technical hurdle, but rather if I should abandon the idea of building a C++ user mode component with the DDK tools...just because the kernel component is pure C.
To build a user mode program with WINDDK you need to add some variables to your SOURCES file:
386_STDCALL=0 to use cdecl calling convention by default
USE_STL=1 to use STL
USE_NATIVE_EH=1 to add a support for exception handling
Everything else you already have.
I'll put my full SOURCES file for reference:
TARGETNAME = MyUserModeComponent
TARGETTYPE = PROGRAM
TARGETPATH = obj
UMTYPE = console
UMENTRY = main
USE_MSVCRT = 1
USE_NATIVE_EH=1
USE_STL=1
386_STDCALL=0
SOURCES= main.cpp
And main.cpp:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s = "bla bla bla!";
cout << s;
return 0;
}
Have fun!
Quick Answer
Abandon the idea of building user-mode components with DDK tools (although I find the concept fascinating :-P)
Your kernel mode component should be built separately from the user mode components as a matter of good practice.
Vague thoughts
Off the top of my head, and this really speaking from limited experience...there are a lot of subtle differences that can creep up if you try to mix the two together.
Using your own example of __cdecl vs __stdcall; You have two different calling conventions. _cdecl is all kernel stuff and all of the C++ methods are wrapped around in WINAPI (_stdcall) passing conventions and __stdcall will clean do auto stack clean up and expect frame pointers inserted all over the place. And if you by accident use compiler options to trigger a __fastcall, it would be a pain to debug.
You can definitely hack something together, but do you really want to keep track of that in your user-space code and build environment? UGH I say.
Unless you have very specific engineering reasons to mix the two environments, (and no a unified build experience is not a valid reason, because you can get that from a batch file called buildall.bat) I say use the separate toolchains.