WordSegmentation in SymSpellPlusPlus - c++

I'd like to use C++ version of SymSpell, which is called SymSpellPlusPlus. In C# version using WordSegmentation looks like this (from the first link):
//word segmentation and correction for multi-word input strings with/without spaces
inputTerm="thequickbrownfoxjumpsoverthelazydog";
maxEditDistance = 0;
suggestion = symSpell.WordSegmentation(input);
//display term and edit distance
Console.WriteLine(suggestion.correctedString + " " + suggestion.distanceSum.ToString("N0"));
In C++ version method WordSegmentation returns shared pointer (from the second link):
...
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input)
{
return WordSegmentation(input, this->maxDictionaryEditDistance, this->maxDictionaryWordLength);
}
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input, size_t maxEditDistance)
{
return WordSegmentation(input, maxEditDistance, this->maxDictionaryWordLength);
}
shared_ptr<WordSegmentationItem> WordSegmentation(const char* input, size_t maxEditDistance, size_t maxSegmentationWordLength)
{
// lines 1039 - 1179 under second link
std::vector<shared_ptr<WordSegmentationItem>> compositions;
...
return compositions[circularIndex];
}
In my code I tried among others the following code:
const char* inputTerm = "whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixtgrade and ins pired him";
auto suggestions = symSpell.WordSegmentation(inputTerm);
But it gives an error:
free() invalid next size (fast)
It is related to memory error, but I don't know how to overcome this problem.
Class WordSegmentationItem looks as follows (lines 292-325 in second link):
class WordSegmentationItem
{
public:
const char* segmentedString{ nullptr };
const char* correctedString{ nullptr };
u_int8_t distanceSum = 0;
double probabilityLogSum = 0;
WordSegmentationItem() { }
WordSegmentationItem(const symspell::WordSegmentationItem & p)
{
this->segmentedString = p.segmentedString;
this->correctedString = p.correctedString;
this->distanceSum = p.distanceSum;
this->probabilityLogSum = p.probabilityLogSum;
}
WordSegmentationItem& operator=(const WordSegmentationItem&) { return *this; }
WordSegmentationItem& operator=(WordSegmentationItem&&) { return *this; }
void set(const char* pSegmentedString, const char* pCorrectedString, u_int8_t pDistanceSum, double pProbabilityLogSum)
{
this->segmentedString = pSegmentedString;
this->correctedString = pCorrectedString;
this->distanceSum = pDistanceSum;
this->probabilityLogSum = pProbabilityLogSum;
}
~WordSegmentationItem()
{
delete[] segmentedString;
delete[] correctedString;
}
};
How should I get the correctedString from the WordSegmentationItem?

The library is buggy and the author needs to make some fixes.
First, compiling gives us a warning about SuggestItem::ShallowCopy, which returns a local variable by reference. Very bad! We can change that to return by value, probably.
This doesn't fix the crash, though.
If we clone the library's repo then run the following testcase in a debugger:
#include "symspell6.h"
int main()
{
const char* inputTerm = "whereis th elove hehad dated forlmuch of thepast who couqdn'tread in sixtgrade and ins pired him";
symspell::SymSpell symSpell;
auto suggestions = symSpell.WordSegmentation(inputTerm);
}
…we see that returning compositions[circularIndex] from the WordSegmentation function is causing an invalid access in the shared_ptr constructor. This suggests that circularIndex is out-of-bounds and giving us a non-existent shared_ptr. Indeed, circularIndex is 95 but compositions.size() is 0!
The function is lacking some serious error checking.
Now, only the author (or at least someone who knows what the library is supposed to do; that's not me!) can fix this properly. But as a quick patch I added the following after line 1055:
if (compositions.empty())
return nullptr;
…and it now at least runs.
It seems that the function assumes the dictionary is non-empty. I don't know whether that's expected behaviour or not (other than the missing error checking as detailed above).
The project is in serious need of some documentation, because no preconditions or postconditions are mentioned for these functions and there is no indication as to how the library is supposed to be used. Again, the author should fix these things.

Related

Is there a C++ Language Feature Mark a Variable to Prevent/Warn about Later Usage in the same Scope?

One of the common mistakes I find in C like code is the usage of a variable which is already consumed in the given scope. See the following fictive example function:
std::string normalizePath(const std::string &path) {
const auto fixedPath = fixDirectorySeparators(path);
if (fixedPath.starts_with('/')) {
return normalizeAbsolutePath(fixedPath);
}
return normalizeRelativePath(path); // use of the wrong variable.
}
In this simple function the problem is easy to spot, if it is more complex, it can be a source of errors.
The example above is oversimplified code to illustrate my question. The code could also look like this:
bool isRangeERZ(int) { ... }
int calculateNextGHA(int gha, int xfactor) {
if (xfactor > 8) {
return 0;
}
if (gha < 0x8008 || xfactor == 0) {
return gha;
}
if (gha > 0x10000) {
return xfactor;
}
int nextGha = gha * 12 * xfactor;
if (isRangeERZ(nextGha) && xfactor > 4) {
return nextGha / 4;
}
return gha; // mistake made here
}
The examples have in common, they have the same type as at least one parameter and return type.
With the introduction of the move semantics in C++11, I noticed all compilers I use easily spot obvious problems like this:
std::string moveExample() {
auto preparedString = createText(); // moved here
auto finalString = processText(std::move(preparedString));
return preparedString; // compiler warns, or stops with error.
}
Now, I wonder: Is there a language feature or a way to get the same effect, even if the variable is copied and not moved?
It could look like this:
std::string normalizePath(const std::string &path) {
const auto fixedPath = fixDirectorySeparators(std::mark_as_consumed(path));
if (fixedPath.starts_with('/')) {
return normalizeAbsolutePath(fixedPath);
}
return normalizeRelativePath(path); // Compiler warns, `path` was consumed.
}
or like this:
std::string normalizePath(const std::string &path) {
const auto fixedPath = fixDirectorySeparators(path);
std::mark_obsolete(path);
if (fixedPath.starts_with('/')) {
return normalizeAbsolutePath(fixedPath);
}
return normalizeRelativePath(path); // Compiler warns, `path` is obsolete at this point.
}
Is there already a C++ language feature (or open proposal) that helps with this kind of problem?
I'm not aware of any language feature that can give you what you want. You can introduce in your application the notion of Depleted/Full objects, but I wouldn't go that way. Instead, you can write your code properly:
Disclaimer: the example shows how to split the code logically and is not optimal.
std::string normalizePath(const std::string& path)
{
return normalizeFixedPath(fixDirectorySeparators(path));
}
std::string fixDirectorySeparators(const std::string& path) {...}
std::string normalizeFixedPath(const std::string& path) {
if (path.starts_with('/')) {
return normalizeAbsolutePath(path);
}
return normalizeRelativePath(path);
}

Access violation new c++ struct

I have this struct:
struct event_ {
bool is_crossover = false;
bool is_birth = false;
bool is_repetitive = false;
int eID = 0;
bool inicio_fin = false;
fecha inicio_fecha;
fecha fin_fecha;
locacion inicio_l;
string eLatitud_i = 0;
string eLongitud_i = 0;
locacion fin_l;
string eLatitud_f = 0;
string eLongitud_f = 0;
personaje_info personajes_evento; //This is a class
int cantidad_personajes = 0;
string nombre;
string descripcion;
string tipo_evento;
event_ *sig, *ant;
};
And then, when I call the function:
event_ *n = new event_;
it sends me an Access Violation Error:
Exception thrown at 0x0F69F6E0 (ucrtbased.dll) in Auxiliar Libros.exe: 0xC0000005: Access violation reading location 0x00000000.
Anyone knows why is this happening?
As additional information, I ran a Code Metrics Analysis, and before this, the program worked perfectly fine. And also it tells me about exceptions, what should I do?
This code
string eLongitud_f = 0;
calls the string constructor with a NULL pointer (0 is another way of writing the NULL pointer), resulting in your access validation error.
What do you think that code is doing? Obviously 0 is an integer not a string. Did you mean this?
string eLongitud_f = "0";
Or did you mean this?
string eLongitud_f = "";
Maybe you even meant this
double eLongitud_f = 0.0;
You can also just have this
string eLongitud_f;
which is the same as the second alternative above. All these are possible, it's hard to know which you really want, but the fundamental problem is that you are have a string variable and you are trying to give it a value which is not a string.
To solve your problem, I think that the best thing to do is to reduce your code and try some combinaison.
First, you must try a little struct with only one bool variable to see if your new function is correct
struct event_
{
bool is_crossover = false;
};
event_ *n = new event_;
If your program continue to crash, your error is there, in new().
Else you can try then to reduce your structure removing what you think is correct.
Personnaly, I think that all your bool, int and event_ declaration are correct, so I remove them.
I think that similar object declaraction can also be removed and I remove them.
I have following structure:
struct event_
{
fecha fin_fecha;
locacion inicio_l;
string eLatitud_i = 0;
personaje_info personajes_evento;
};
What happens when you build and run this code ?
If you program has stopped to crash, the error is in removed code ?
Else, one (or more) declaration's line of this new structure is incorrect.
If changing your struct has too much impact in your code, your create a similar structure (other name not yet used) and you test it.
Please, can you try ? I think that you will find very quickly solve the problem yourself !
There are too much variables in your first code that can produce your crash ?

c++ invalid pointer/double free on class with array member

#include <iostream>
using namespace std;
const int ALPHABET = 26;
const int LANG = 4;
const double TOLK[LANG][ALPHABET]= {{0}};
class Text
{
private:
string sample;
int* histogram;
double* rel_histogram;
int sample_size;
public:
Text();
~Text();
string parse();
};
string parsing(const double TOLK[][ALPHABET], double rel_occurence_arr[]);
int main()
{
Text myText;
myText.parse();
return 0;
}
Text::Text(){
sample = "";
histogram = new int[ALPHABET];
rel_histogram = new double[ALPHABET];
sample_size = 0;
}
Text::~Text(){
delete[] histogram;
delete[] rel_histogram;
}
string Text::parse(){
parsing(TOLK, rel_histogram);
//Invalid pointer here
}
string parsing(const double TOLK_HJALP[][ALPHABET], double rel_occurence_arr[]){
return "test";
}
This is part of a larger code, but I've peeled of everything I could till only the parts causing the error remains. Running it like this results in a invalid pointer error, running it with all the extra bits causes a double free/corruption error. But I think that if I can figure it out at this level I can probably figure it out at the larger scale.
From what I've gathered, I think that the Text class is trying to delete something which has already been deleted when the parsing function returned. I don't know if that is correct, but if it was, I have no idea on how to stop it from happening. It doesn't matter if I send a copy(in the way I tried, maybe there are more ways than one?).
And also, removing iostream from the include seems to remove the error, for whatever reason. Why is that? It isn't even used here?
Thanks in advance.
There are two issues with your code that I can see.
(1) This should be what is causing your error. You are not including string, and iostream doesn't need to include it. This means you are returning a pointer to a char from parsing, but the pointer is deleted when parsing returns. This results in undefined behavior.
(2) parse doesn't return a value, but it promises to in its declaration. That could cause some issues.
Note: You should try using -Wall when you run into a problem (or just all the time). That would have caught both of those errors for you.

*** glibc detected *** ms2: free(): invalid pointer: 0xb7526ff4 ***

I keep getting this error and I'm not sure how to correct it as I am given no errors in my code editors. I have looked up similar issues, but I am still having trouble to understand how to apply the solutions here. I've tried altering my code for several hours now, but to no avail. Any help would be appreciated. I have provided my .h and .cpp files below.
ErrorMessage.h
#ifndef SICT_ERRORMESSAGE_H
#define SICT_ERRORMESSAGE_H
#include <iostream>
namespace sict {
class ErrorMessage {
char* message_; //pointer that holds the address of the message stored in current object
public:
explicit ErrorMessage(const char* errorMessage = nullptr); //receive address of a C-style nullterminate string holding an error message
ErrorMessage(const ErrorMessage& em) = delete; //deleted copy constructor that prevents copying of an ErrorMessage object
ErrorMessage& operator=(const ErrorMessage& em) = delete; //deleted assignment operator that prevents assignment of ErrorMessage object to current object
virtual ~ErrorMessage(); //deallocates any memory that has been dynamically allocated by the current object
void clear(); //clears any message stored by current object and initialize object to safe, empty state
bool isClear() const; //return true if object is in a safe, empty state
void message(const char* str); //stores a copy of the C-style string pointed to by str
const char* message() const; //return address of the message stored in current object
};
//helper operator
std::ostream& operator<<(std::ostream& os, const ErrorMessage& err);
}
#endif
ErrorMessage.cpp
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <cstring>
#include "ErrorMessage.h"
namespace sict {
ErrorMessage::ErrorMessage(const char* errorMessage) {
if(errorMessage == nullptr) {
message_ = nullptr;
}
else {
message(errorMessage);
}
}
ErrorMessage::~ErrorMessage() {
delete [] message_;
}
void ErrorMessage::clear() {
delete [] message_;
message_ = nullptr;
}
bool ErrorMessage::isClear() const {
if(message_ == nullptr) {
return true;
}
else {
return false;
}
}
void ErrorMessage::message(const char* str) {
delete [] message_;
message_ = new char[strlen(str) + 1];
strcpy(message_, str);
}
const char* ErrorMessage::message() const {
return message_;
}
std::ostream& operator<<(std::ostream& os, const ErrorMessage& err) {
if(!err.isClear()) {
os << err.message();
}
return os;
}
}
It's not surprising your code made it through editor syntax checks and compilation - it's valid code. It's just got an incorrect pointer somewhere.
This may mean that your'e accidentally dereferencing something , or perhaps passing a value somewhere you should be passing a pointer. You should get a compile time warning about that kind of stuff.
Another possibility is that you're failing to initialize some pointer, and its value happens to be 0xb75....
Clearly, neither you nor I are not likely to guess from whence this error originates. As Sam Varshavchik pointed out in a comment, you don't even know if the code you posted is the source of the error. Even if you guess your way through this one ( or perhaps keenly observe, Sam ), it's just plain silly to try to write C++ that way.
What you need is a debugger. A debugger is a program you run your program within, and it keeps track of the program's state so that when you have a memory violation, the debugger can produce a backtrace showing where in your source code the error occurred. You also have to compile your program with debugging support, so that the debugger has markers it can use to refer back to the source code.
It's a process far beyond the scope of your question, but one that's easy to learn about once you know what you're going for. Look for one that integrates with your IDE, if possible, as you're leveraging your development environment heavily. It's not unlikely that you already have it set up- you might just need to use it. Search for C++ debugging in the context of your editor first - if it turns up nothing, consider searching under your compiler suite, whatever that may be ( if your'e using open source, you're probably using gcc, and the matching debugger is gdb ).
You're about to gain a far more accurate understanding of what it is to program C / C++. Good luck.

C++ - Passing Pointer Into Function

I keep on receiving odd unexpected values for my bool testValue. I keep receiving random numbers as I believe it is trying to access another region of memory. I predict it is how my code is setup within my testNumber() function, but I am unsure of how to solve it. This is my logic.
I have set ok to true. Now I assign the memory address of ok to pOk.
void TextBox::lengthTest(bool *pOk, int length) {
bool ok;
if (length < MAX_LENGTH) {
ok = true;
pOk = &ok;
} else {
ok = false;
pOk = &ok;
}
}
bool lengthTestBool = lengthTest(*pOk, length);
cout << lengthTestBool <<;
output:
85
You have a fundamental misunderstanding of how one uses pointers to implement reference semantics. You want to change thing that is pointed to by the pointer:
*pOK = ok;
However, C++ actually supports references semantics natively through reference types, which may be preferable:
void testNumber(bool & OK, int n)
{
OK = true;
// ...
}
Even better, though, is to simply return a bool:
bool testNumber(int n) { /* ... */ }
if (testNumber(x)) //... etc.