The semantics of TessBaseAPI::Clear() - c++

Suppose I've created two objects of TessBaseAPI — xapi and yapi — initialized by calling the following overload of Init() function:
int Init(const char * datapath,
const char * language,
OcrEngineMode oem,
char ** configs,
int configs_size,
const GenericVector< STRING > * vars_vec,
const GenericVector< STRING > * vars_values,
bool set_only_non_debug_params
);
passing exactly identical arguments.
Since the objects are initialized with identical arguments, at this point xapi and yapi are assumed to be identical from behavioral1 perspective. Is my assumption correct? I hope so, as I don't find any reason for the objects to be non-identical.
Now I'm going to use xapi to extract information from an image but before that I call SetVariable() a number of times, to set few more configurations.
bool SetVariable(const char * name, const char * value);
and then I used xapi to extract some text from an image. Once I'm done with the extraction, I did this:
xapi.Clear(); //what exactly happens here?
After the call to Clear(), can I use xapi and yapi interchangeably? In other words, can I assume that xapi and yapi are identical at this point from behavioral1 perspective? Can I say Clear() is actually a reset functionality?
1. By "behavioral", I meant performance in terms of accuracy, not speed/latency.

According to the void tesseract::TessBaseAPI::Clear() documentation, the call to this function will free up the image data and the recognition results. It says nothing about configuration data. Moreover, if the authors consider the configuration data as being time-consuming to load, it's going to be kept intact: without actually freeing any recognition data that would be time-consuming to reload.
Answering your other questions:
"After the call to Clear(), can I use xapi and yapi interchangeably?" -- yes, you may, but results might differ because of different settings you have applied to xapi via SetVariable(), but not to yapi.
"In other words, can I assume that xapi and yapi are identical at this point from behavioral1 perspective?" -- depending on what settings you have changed with SetVariable(), the results may be or may be not the same.
"Can I say Clear() is actually a reset functionality?" -- only the recognition results and the image data is discarded, everything else is kept intact. Depending on your definition of reset, you may call it a reset or not, it's a free country after all =)
You may check the difference between Clear() and the full teardown using End(). It's around line 1400 of baseapi.cpp.

Since the objects are initialized with identical arguments, at this point xapi and yapi are assumed to be identical from behavioral perspective. Is my assumption correct?
From the outset there is nothing I can find to dispute this assumption.
Investigating the source code.
The following parameters are cleared or reset (if you will):
When calling Clear() the following are called:
01402 void TessBaseAPI::Clear() {
01403 if (thresholder_ != NULL)
01404 thresholder_->Clear();
01405 ClearResults();
01406 }
Calling thresholder_->Clear(); destroys the pix (if not null)
00044 // Destroy the Pix if there is one, freeing memory.
00045 void ImageThresholder::Clear() {
00046 if (pix_ != NULL) {
00047 pixDestroy(&pix_);
00048 pix_ = NULL;
00049 }
00050 image_data_ = NULL;
00051 }
For Clear Results, as shown below.
01641 void TessBaseAPI::ClearResults() {
01642 if (tesseract_ != NULL) {
01643 tesseract_->Clear();
01644 }
01645 if (page_res_ != NULL) {
01646 delete page_res_;
01647 page_res_ = NULL;
01648 }
01649 recognition_done_ = false;
01650 if (block_list_ == NULL)
01651 block_list_ = new BLOCK_LIST;
01652 else
01653 block_list_->clear();
01654 if (paragraph_models_ != NULL) {
01655 paragraph_models_->delete_data_pointers();
01656 delete paragraph_models_;
01657 paragraph_models_ = NULL;
01658 }
01659 }
The page results, block list are set to null, along with associated flags being reset.
tesseract_->Clear() releases the following:
00413 void Tesseract::Clear() {
00414 pixDestroy(&pix_binary_);
00415 pixDestroy(&cube_binary_);
00416 pixDestroy(&pix_grey_);
00417 pixDestroy(&scaled_color_);
00418 deskew_ = FCOORD(1.0f, 0.0f);
00419 reskew_ = FCOORD(1.0f, 0.0f);
00420 splitter_.Clear();
00421 scaled_factor_ = -1;
00422 ResetFeaturesHaveBeenExtracted();
00423 for (int i = 0; i < sub_langs_.size(); ++i)
00424 sub_langs_[i]->Clear();
00425 }
Noteworthy,
SetVariable does not affect init values:
Only works for non-init variables (init variables should be passed to Init()).
00143 bool TessBaseAPI::SetVariable(const char* name, const char* value) {
00144 if (tesseract_ == NULL) tesseract_ = new Tesseract;
00145 return ParamUtils::SetParam(name, value, SET_PARAM_CONSTRAINT_NON_INIT_ONLY,
00146 tesseract_->params());
00147 }
After the call to Clear(), can I use xapi and yapi interchangeably?
No. Certainly not if you used a thresholder.
Can I say Clear() is actually a reset functionality?
Not in the sense of restoring it to it's initialised state. It will change some values of the original object to null. It will keep the grunt work of parameters like const char * datapath, const char * language, OcrEngineMode oem,. It seems to be a way to free memory without obliterating the object. Inline with "without actually freeing any recognition data that would be time-consuming to reload.".
After calling Clear() call either SetImage or TesseractRect before using Recognition or Get* functions.
Clear will not dispose of the SetVariables, they will only be reset to default upon destruction of the object by calling End().
Looking at the TessbaseApi() class, you can see what you are initialising and which of these values will be reset using Clear().
00091 TessBaseAPI::TessBaseAPI()
00092 : tesseract_(NULL),
00093 osd_tesseract_(NULL),
00094 equ_detect_(NULL),
00095 // Thresholder is initialized to NULL here, but will be set before use by:
00096 // A constructor of a derived API, SetThresholder(), or
00097 // created implicitly when used in InternalSetImage.
00098 thresholder_(NULL),
00099 paragraph_models_(NULL),
00100 block_list_(NULL),
00101 page_res_(NULL),
00102 input_file_(NULL),
00103 output_file_(NULL),
00104 datapath_(NULL),
00105 language_(NULL),
00106 last_oem_requested_(OEM_DEFAULT),
00107 recognition_done_(false),
00108 truth_cb_(NULL),
00109 rect_left_(0), rect_top_(0), rect_width_(0), rect_height_(0),
00110 image_width_(0), image_height_(0) {
00111 }
Given that the base constructor for the class is:
(datapath, language, OEM_DEFAULT, NULL, 0, NULL, NULL, false);
These three parameters are always needed, which makes sense.
If the datapath, OcrEngineMode or the language have changed - start again.
Note that the language_ field stores the last requested language that was initialized successfully, while tesseract_->lang stores the language actually used. They differ only if the requested language was NULL, in which case tesseract_->lang is set to the Tesseract default ("eng").

Related

How can I calculate a hash/checksum/fingerprint of an object in c++?

How can I calculate a hash/checksum/fingerprint of an object in c++?
Requirements:
The function must be 'injective'(*). In other words, there should be no two different input objects, that return the same hash/checksum/fingerprint.
Background:
I am trying to come up with a simple pattern for checking whether or not an entity object has been changed since it was constructed. (In order to know which objects need to be updated in the database).
Note that I specifically do not want to mark the object as changed in my setters or anywhere else.
I am considering the following pattern: In short, every entity object that should be persisted, has a member function "bool is_changed()". Changed, in this context, means changed since the objects' constructor was called.
Note: My motivation for all this is to avoid the boilerplate code that comes with marking objects as clean/dirty or doing a member by member comparison. In other words, reduce risk of human error.
(Warning: psudo c++ code ahead. I have not tried compiling it).
class Foo {
private:
std::string my_string;
// Assume the "fingerprint" is of type long.
long original_fingerprint;
long current_fingerprint()
{
// *** Suggestions on which algorithm to use here? ***
}
public:
Foo(const std::string& my_string) :
my_string(my_string)
{
original_fingerprint = current_fingerprint();
}
bool is_changed() const
{
// If new calculation of fingerprint is different from the one
// calculated in the constructor, then the object has
// been changed in some way.
return current_fingerprint() != original_fingerprint;
}
void set_my_string(const std::string& new_string)
{
my_string = new_string;
}
}
void client_code()
{
auto foo = Foo("Initial string");
// should now return **false** because
// the object has not yet been changed:
foo.is_changed();
foo.set_my_string("Changed string");
// should now return **true** because
// the object has been changed:
foo.is_changed();
}
(*) In practice, not necessarily in theory (like uuids are not unique in theory).
You can use the CRC32 algorithm from Boost. Feed it with the memory locations of the data you want to checksum. You could use a hash for this, but hashes are cryptographic functions intended to guard against intentional data corruption and are slower. A CRC performs better.
For this example, I've added another data member to Foo:
int my_integer;
And this is how you would checksum both my_string and my_integer:
#include <boost/crc.hpp>
// ...
long current_fingerprint()
{
boost::crc_32_type crc32;
crc32.process_bytes(my_string.data(), my_string.length());
crc32.process_bytes(&my_integer, sizeof(my_integer));
return crc32.checksum();
}
However, now we're left with the issue of two objects having the same fingerprint if my_string and my_integer are equal. To fix this, we should include the address of the object in the CRC, since C++ guarantees that different objects will have different addresses.
One would think we can use:
process_bytes(&this, sizeof(this));
to do it, but we can't since this is an rvalue and thus we can't take its address. So we need to store the address in a variable instead:
long current_fingerprint()
{
boost::crc_32_type crc32;
void* this_ptr = this;
crc32.process_bytes(&this_ptr, sizeof(this_ptr));
crc32.process_bytes(my_string.data(), my_string.length());
crc32.process_bytes(&my_integer, sizeof(my_integer));
return crc32.checksum();
}
Such a function does not exist, at least not in the context that you are requesting.
The STL provides hash functions for basic types (std::hash), and you could use these to implement a hash function for your objects using any reasonable hashing algorithm.
However, you seem to be looking for an injective function, which causes a problem. Essentially, to have an injective function, it would be necessary to have an output of size greater or equal to that of the object you are considering, since otherwise (from the pigeon hole principle) there would be two inputs that give the same output. Given that, the most sensible option would be to just do a straight-up comparison of the object to some sort of reference object.

What is the difference between not initializing a pointer, and having it be initialized to null?

I'm building a simple generic engine for my true start in the making of games, and I am trying to be somehow organized and decent in the making of my engine, meaning I don't want it to be something I throw to the side once I make what I'm planning to.
I add objects to be displayed, drawObjects, and these can either move, not move, and have an animation, or not have one.
In case they DO have an animation, I want to initialize a single animationSet, and this animationSet will have xxx animationComp inside of it. As I'm trying to be neat and have worked abit on "optimizations" towards memory and cpu usage (such as sharing already-loaded image pointers, and whatever came across my mind), I wanted to not ask for possibly unused memory in arrays.
So I had animationSetS* animationSet = NULL; initially, planning to do a animationSet = animationSetS[spacesINEED]; after, only on the objects that needed animation that I added, being those that aren't animations a NULL and therefore not using memory (correct?).
And then this question popped up! (title)
struct animationComp {
SDL_Rect* clip;
int clipsize;
};
struct animationSetS {
animationComp* animation;
int currentFrame;
int currentAnimation;
int animationNumber;
};
struct drawObject { // Um objecto.
char* name;
SDL_Surface* surface;
bool draw = true;
float xPos;
float yPos;
bool willMove = false; // 0 - Won't move, 10 - Moves alot, TO IMPLEMENT
bool isSprite = false;
animationSetS* animationSet;
};
I dabble alot in my questions, sorry for that. For any clarifications reply here, I'll reply within 10 minutes for the next... 1 hour perhaps? Or more.
Thanks!
Setting the pointer to NULL means that you'll be able to add ASSERT(ptr != NULL); and KNOW that your pointer does not accidentally contain some rubbish value from whatever happens to be in the memory it was using.
So, if for some reason, you end up using the object before it's been properly set up, you can detect it.
It also helps if you sometimes don't use a field, you can still call delete stuff; [assuming it's allocated in the first place].
Note that leaving a variable uninitialized means that it can have ANY value within it's valid range [and for some types, outside the valid range - e.g. pointers and floating point values can be "values that are not allowed by the processor"]. This means that it's impossible to "tell" within the code if it has been initialized or not - but things will go horribly wrong if you don't initialize things!
If this should be really implemented in C++ (as you write), why don't you use the C++ Standard Library? Like
struct animationSetS {
std::vector< std::shared_ptr<animationComp> > animation;
// ...
}

Saving image with libpng - const object

I have a class which manages a grayscale image. I want to save it with libpng. To do that I want to use a const member function like this:
void GrayscaleImage::SavePNG(std::string filename) const
{
// ...
png_bytep* row_pointers = new png_bytep[m_height];
for (int i = 0; i < height_; i++) {
row_pointers[i] = const_cast<png_bytep>(m_data.data()) + i * m_width * sizeof(uint8_t);
}
png_set_rows(png_ptr, info_ptr, row_pointers);
// ...
}
The problem is that the third argument of png_set_rows is non-const, so I have to use const_cast at some point, if I want the member function GrayscaleImage::SavePNG to be const. I'm wondering, is it safe to do this?
libpng provides an API to cause it to free the row_pointers and the stuff they point to; png_data_freer. That's the default on read (where png_set_rows can currently be called but the call gets ignored).
What you did is safe, so long as you don't call png_data_freer. None of the write APIs modify the input data.
The problem exists in png_write_image, the API png_write_png calls, as well, and it exists in png_write_rows. It used to exist in png_write_row which is the lowest level API but that was fixed in libpng 1.5; it's a quiet API change there because it doesn't change the type compatibility of the argument. Changing it any higher would cause existing applications to fail to compile because of type errors.
You're not likely to see changes soon; changing the API in ways that require applications to rewrite existing code is unlikely to happen until libpng 2.0 in my opinion.
It's not safe. Without the third parameter of png_set_rows() declared const, you have no guarantee that it will not modify the input data.
You will always have to deal with libraries that don't declare parameters const even if they could. That is why const_cast exists. You should use it with extreme caution. But judging from the description, it's unlikely that png_set_rows() will modify your data.
EDIT: here is the source code. You can see it doesn't modify row_pointers. (But it definitely modifies the other two arguments!)
void PNGAPI
png_set_rows(png_structp png_ptr, png_infop info_ptr, png_bytepp row_pointers)
{
png_debug1(1, "in %s storage function", "rows");
if (png_ptr == NULL || info_ptr == NULL)
return;
if (info_ptr->row_pointers && (info_ptr->row_pointers != row_pointers))
png_free_data(png_ptr, info_ptr, PNG_FREE_ROWS, 0);
info_ptr->row_pointers = row_pointers;
if (row_pointers)
info_ptr->valid |= PNG_INFO_IDAT;
}
The const in your function definition just says that your instance shouldn't change. Saving to a file shouldn't change your instance so it's alright.
Of course the third parameter of png_set_rows isn't const because it gets set.
It doesn't matter if you create, destroy or change things in a const function as long as they don't belong to your class. Your code doesn't change any instance of GrayscaleImage.

How to make a getter function for a double pointer?

I am needing to modify an open source project to prevent reusing code (more efficient just to create a GetGameRulesPtr() function than to keep going into the engine to retrieve it. The problem is, it is stored as void **g_pGameRules. Ive never really grasped the concept of a pointer to a pointer, and I am a bit confused.
I am creating a GetGameRules() function to retrieve this pointer, but im not sure if my getter function should be void* ret type and then return *g_pGameRules, or how exactly I should go about this. I am actually brushing on my pointer usage now, but wanted to find out the proper method to learn from.
Here is the code, lines 58-89 are the SDK function that retrieve the g_pGameRules pointer from the game engine. The other functions are what I am adding the getter function to.
// extension.cpp
class SDKTools_API : public ISDKTools
{
public:
virtual const char *GetInterfaceName()
{
return SMINTERFACE_SDKTOOLS_NAME;
}
virtual unsigned int GetInterfaceVersion()
{
return SMINTERFACE_SDKTOOLS_VERSION;
}
virtual IServer *GetIServer()
{
return iserver;
}
virtual void *GetGameRules()
{
return *g_pGameRules;
}
} g_SDKTools_API;
// extension.h
namespace SourceMod
{
/**
* #brief SDKTools API.
*/
class ISDKTools : public SMInterface
{
public:
virtual const char *GetInterfaceName() = 0;
virtual unsigned int GetInterfaceVersion() = 0;
public:
/**
* #brief Returns a pointer to IServer if one was found.
*
* #return IServer pointer, or NULL if SDKTools was unable to find one.
*/
virtual IServer* GetIServer() = 0;
/**
* #brief Returns a pointer to GameRules if one was found.
*
* #return GameRules pointer, or NULL if SDKTools was unable to find one.
*/
virtual void* GetGameRules() = 0;
};
}
// vglobals.cpp
void **g_pGameRules = NULL;
void *g_EntList = NULL;
void InitializeValveGlobals()
{
g_EntList = gamehelpers->GetGlobalEntityList();
char *addr;
#ifdef PLATFORM_WINDOWS
/* g_pGameRules */
if (!g_pGameConf->GetMemSig("CreateGameRulesObject", (void **)&addr) || !addr)
{
return;
}
int offset;
if (!g_pGameConf->GetOffset("g_pGameRules", &offset) || !offset)
{
return;
}
g_pGameRules = *reinterpret_cast<void ***>(addr + offset);
#elif defined PLATFORM_LINUX || defined PLATFORM_APPLE
/* g_pGameRules */
if (!g_pGameConf->GetMemSig("g_pGameRules", (void **)&addr) || !addr)
{
return;
}
g_pGameRules = reinterpret_cast<void **>(addr);
#endif
}
You want to return a void*, and do the casting back to the appropriate SomeType** within implementation code. This is because void** has strange semantics (which I can't find on google right now). It also tells more info to the user than they really need. The whole point of using void* to begin with was to avoid giving information to the user that they don't need.
If it is an option, I'd personally recommend avoiding void* altogether, and simply providing an opaque reference type for them to call your APIs with. One way to do this would be to define a fake structure, like struct GameObjectRef {};, and pass the user back a GameObjectRef*, casted from whatever pointer your system actually uses. This allows the user to write strongly typed code, so they can't accidentally provide the wrong pointer type to your functions, as they can with void*.
How pointers (and pointers-to-pointers) work:
Imagine you are asking me where your aunt lives. Then, I hand you a piece of paper with an address to go to. That piece of paper is a pointer to a house.
Now, take that piece of paper with the address, take a photo of it with your digital camera, and place the image onto your personal wiki site.
Now, if your sister calls, asking for your aunt's address, just tell her to look it up on your wiki. If she asks for the URL, write it on a piece of paper for her. This second piece of paper is a pointer to a pointer to a house.
You can see how an address isn't the same as the real thing. Just because someone has your website address doesn't mean they know your aunt's address. And just because they have your aunt's address doesn't mean they're knocking on her door. The same is true for pointers to objects.
You can also see how you can make copies of addresses (pointers), but that doesn't make a copy of the underlying object. When you take a photo of your aunt's address, your aunt doesn't get a shiny new house.
And you can see how dereferencing a pointer will lead you back to the original object. If you to go the wiki site, you get your aunt's address. If you drive to that address, you can leave a package on her doorstep.
Note that these aren't perfect metaphors, but they are close enough to be somewhat descriptive. Real pointers-to-pointers are a lot cleaner than those examples. They describe only two things - the type of the final object (say, GameObject), and the number of levels of indirection (say, GameObject** - two levels).
I think its not a double pointer, but a pointer to pointer, and yes if you want to get void* you must return *g_pGameRules.
The think is, that the pointers is like levels. You must show which level you want to get.
A pointer to a pointer is useful if you want have the pointer change (to a larger block of memory if you run out, for example), but keep a common reference to the item, wherever it moves.
If you are not going to be reallocating or moving the block pointed to, then dereferencinig like you have in your getter is fine. I haven't looked at the library you are using, but one thing to consider is that it may have reference counting when you get an instance in order to ensure the object isn't changed after you get the pointer.
So, what I would recommend is that you look to see if the library has any "Factory" functions or "instance" creating functions and use those.
-Dan8080

Pointer object in C++

I have a very simple class that looks as follows:
class CHeader
{
public:
CHeader();
~CHeader();
void SetCommand( const unsigned char cmd );
void SetFlag( const unsigned char flag );
public:
unsigned char iHeader[32];
};
void CHeader::SetCommand( const unsigned char cmd )
{
iHeader[0] = cmd;
}
void CHeader::SetFlag( const unsigned char flag )
{
iHeader[1] = flag;
}
Then, I have a method which takes a pointer to CHeader as input and looks
as follows:
void updateHeader(CHeader *Hdr)
{
unsigned char cmd = 'A';
unsigned char flag = 'B';
Hdr->SetCommand(cmd);
Hdr->SetFlag(flag);
...
}
Basically, this method simply sets some array values to a certain value.
Afterwards, I create then a pointer to an object of class CHeader and pass it to
the updateHeader function:
CHeader* hdr = new CHeader();
updateHeader(hdr);
In doing this, the program crashes as soon as it executes the Hdr->SetCommand(cmd)
line. Anyone sees the problem, any input would be really appreciated
When you run into a crash, act like a crime investigator: investigate the crime scene.
what is the information you get from your environment (access violation? any debug messages? what does the memory at *Hdr look like? ...)
Is the passed-in Hdr pointer valid?
Then use logical deduction, e.g.:
the dereferencing of Hdr causes an access violation
=> passed in Hdr points to invalid memory
=> either memory wasn't valid to start with (wrong pointer passed in), or memory was invalidated (object was deleted before passing in the pointer, or someone painted over the memory)
...
It's probably SEGFAULTing. Check the pointers.
After
your adding some source code
your comment that the thing runs on another machine
the fact that you use the term 'flag' and 'cmd' and some very small datatypes
making me assume the target machine is quite limited in capacity, I suggest testing the result of the new CHeader for validity: if the system runs out of resources, the resulting pointer will not refer to valid memory.
There is nothing wrong with the code you've provided.
Are you sure the pointer you've created is the same same address once you enter the 'updateHeader' function? Just to be sure, after new() note the address, fill the memory, sizeof(CHeader), with something you know is unique like 0XDEAD, then trace into the updateHeader function, making sure everything is equal.
Other than that, I wonder if it is an alignment issues. I know you're using 8 bit values, but try changing your array to unsigned ints or longs and see if you get the same issue. What architecture are you running this on?
Your code looks fine. The only potential issue I can see is that you have declared a CHeader constructor and destructor in your class, but do not show the implementation of either. I guess you have just omitted to show these, else the linker should have complained (if I duplicate this project in VC++6 it comes up with an 'unresolved external' error for the constructor. It should also have shown the same error for the destructor if you had a... delete hdr; ...statement in your code).
But it is actually not necessary to have an implementation for every method declared in a class unless the methods are actually going to get called (any unimplemented methods are simply ignored by the compiler/linker if never called). Of course, in the case of an object one of the constructor(s) has to be called when the object is instantiated - which is the reason the compiler will create a default constructor for you if you omit to add any constructors to your class. But it will be a serious error for your compiler to compile/link the above code without the implementation of your declared constructor, so I will really be surprised if this is the reason for your problem.
But the symptoms you describe definitely sounds like the 'hdr' pointer you are passing to the updateHeader function is invalid. The reason being that the 1st time you are dereferencing this pointer after the updateHeader function call is in the... Hdr->SetCommand(cmd); ...call (which you say crashes).
I can only think of 2 possible scenarios for this invalid pointer:
a.) You have some problem with your heap and the allocation of memory with the 'new' operator failed on creation of the 'hdr' object. Maybe you have insufficient heap space. On some embedded environments you may also need to provide 'custom' versions of the 'new' and 'delete' operator. The easiest way to check this (and you should always do) is to check the validity of the pointer after the allocation:
CHeader* hdr = new CHeader();
if(hdr) {
updateHeader(hdr);
}
else
//handle or throw exception...
The normal behaviour when 'new' fails should actually be to throw an exception - so the following code will cater for that as well:
try{
CHeader* hdr = new CHeader();
} catch(...) {
//handle or throw specific exception i.e. AfxThrowMemoryException() for MFC
}
if(hdr) {
updateHeader(hdr);
}
else
//handle or throw exception...
}
b.) You are using some older (possibly 16 bit and/or embedded) environment, where you may need to use a FAR pointer (which includes the SEGMENT address) for objects created on the heap.
I suspect that you will need to provide more details of your environment plus compiler to get any useful feedback on this problem.