Error comparing french characters in a c++ string - c++

I was wondering if any of you could help me with a problem I'm having. Currently I have a function that takes in a c-style string, creates a temporary c++ style string and store the c string into it, and uses the find_first_not_of command to look for invalid characters, some of which include french characters like 'à'. However, when I pass in a string containing french characters, it doesn't recognize them as valid.
I am using visual studio 2013 on Windows 8, and a few people have told me that the issue is that how VS encodes it's files is different then how it encodes input from the command prompt, but I do not know how to fix that. Do any of you know how I would go about doing this? Or is is a different problem with my code entirely?
My code for the function is as follow:
bool checkValidCharacters(const char* input)
{
std::string checkString(input);
bool validCharacters = false;
std::size_t found = checkString.find_first_not_of("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZàâäèêëîôùûüÿçÀÂÄÈÉÊÎÏÔÙÛÜŸÇ-. ");
if (found != std::string::npos)
{
printf("Error: Invalid character: %c", input[found]);
}
else
{
printf("All characters valid\n");
validCharacters = true;
}
return validCharacters;
}
Thanks a bunch.

Related

C++ Recognize UTF-8 or Hebrew languague

I'm working on some code that his target it recognize if the strings equal
Have two type of string - string 1 came from text file , string 2 came from server side from chat packet
i try very different options , this my last trying but nothing success the sentence not recognize has equal at all for example this string on text file "בדיקה" and the string that came from packet side is "בדיקה" too and still nothing equal
`
if(gSentenceEvent.IsRunning())
{
std::string s = lpMsg->message;
int Len = strlen(gSentenceEvent.RandomSentence);
std::string str;
str.assign(gSentenceEvent.RandomSentence, gSentenceEvent.RandomSentence + Len);
if (str.compare(s) == 0)
{
gSentenceEvent.SetRunning(false);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,gMessage.GetMessage(1130));
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,gMessage.GetMessage(1127),lpObj->Name);
}
else
{
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%s Try %s\n",lpObj->Name,s);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"Answer Is %s\n",str);
}
}
`
if someone have any idea for solving the issue i will be happy to hear some ways that recognize it well
Thanks in advance !
trying convert the text for wstring as well but still nothing
when i check the hex value of both sentence even they equal
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%.2X",lpMsg->message);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%.2X",gSentenceEvent.RandomSentence);
that really came different from example "בדיקהה 3" On both sides
ServerSide = 22B6970
TextFile = D9C0B0

C++, how to remove char from a string

I have to remove some chars from a string, but I have some problems. I found this part of code online, but it does not work so well, it removes the chars but it even removes the white spaces
string messaggio = "{Questo e' un messaggio} ";
char chars[] = {'Ì', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < strlen(chars); ++i)
{
messaggio.erase(remove(messaggio.begin(), messaggio.end(), chars[i]), messaggio.end());
}
Can someone tell me how this part of code works and why it even removes the white spaces?
Because you use strlen on your chars array. This function stops ONLY when it encounters a \0, and you inserted none... So you're parsing memory after your array - which is bad, it should even provoke a SEGFAULT.
Also, calling std::remove is enough.
A correction could be:
char chars[] = {'I', '\x1','\"','{', '}',':'};
for (unsigned int i = 0; i < sizeof(chars); ++i)
{
std::remove(messaggio.begin(), messaggio.end(), chars[i]) ;
}
Answer for Wissblade is more or less correct, it just lacks of some technical details.
As mentioned strlen searches for terminating character: '\0'.
Since chars do not contain such character, this code invokes "Undefined behavior" (buffer overflow).
"Undefined behavior" - means anything can happen, code may work, may crash, may give invalid results.
So first step is to drop strlen and use different means to get size of the array.
There is also another problem. Your code uses none ASCII character: 'Ì'.
I assume that you are using Windows and Visual Studio. By default msvc compiler assumes that file is encoded using your system locale and uses same locale to generate exactable. Windows by default uses single byte encoding specific to your language (to be compatible with very old software). Only in such chase you code has chance to work. On platforms/configuration with mutibyte encoding, like UTF-8 this code can't work even after Wisblade fixes.
Wisblade fix can take this form (note I change order of loops, now iteration over characters to remove is internal loop):
bool isCharToRemove(char ch)
{
constexpr char skipChars[] = {'Ì', '\x1','\"','{', '}',':'};
return std::find(std::begin(skipChars), std::end(skipChars), ch) != std::end(skipChars);
}
std::string removeMagicChars(std::string message)
{
message.erase(
std::remove_if(message.begin(), message.end(), isCharToRemove),
message.end());
}
return message;
}
Let me know if you need solution which can handle more complex text encoding.

String handling with Nordic characters is difficult in C++

I have tried many ways to solve this problem. I just want to part a string or do stuff with each character. As soon as there are Nordic characters in the string, it's not possible to part that string.
The length() function returns the right answer if we look at memory use, but that's not the same as the string length. "ABCÆØÅ" does not have 6 as the length, is has 9. One extra for each special character.
Anybody with a good answer??
The test under here, shows the problem, some letters and a lot of ? marks. :-(
int main()
{
string name = "some æøå string";
for_each(name.begin(), name.end(), [] (char c) {
cout << c;
cout << endl;
});
}
If your terminal supports utf-8 encoding shouldn't be no problem in using the std::cout with the string you enter, but, you need to tell the compiler that you typed in an utf8 string, like this:
int main()
{
string name = u8"some æøå string";
for_each(name.begin(), name.end(), [] (char c) {
cout << c;
cout << endl;
});
cout<<name; //this will also work
return 0; //add this just to be tidy
}
you need to that because characters in UTF-8 might need 1,2,3 or 4 bytes depending on its face.
Then depending on what you need to do, for example split between characters, you should create a function to detect how long is each utf8 character. Then you create a 'string' for each utf8 character and extract as many characters as needed from the original string.
There is a very good library (very compact) utf8proc that let you do those such things.
utf8proc helped me in many projects for resolving these kind of issues.

read txt file in c++ (chinese)

I'm trying to develop function that check whether chinese word which user enters is in the txt file or not. The following is the code. But it is not working. I want to know what the problem is. Help me please.
setlocale(LC_ALL, "Chinese-simplified");
locale::global(locale("Chinese_China"));
SetConsoleOutputCP(936);
SetConsoleCP(936);
bool exist = FALSE;
cout << "\n\n <Find the keyword whether it is in that image or not> \n ";
cout << "Enter word to search for: ";
wstring search;
wcin >> search; //There is a problem to enter chinese.
wfstream file_text("./a.txt");
wstring line;
wstring::size_type pos;
while (getline(file_text, line))
{
pos = line.find(search);
if (pos != wstring::npos) // string::npos is returned if string is not found
{
cout << "Found!" << endl;
exist = true;
break;
}
}
when I use this code, The result is as follows.
const int oldMbcp = _getmbcp();
_setmbcp(936);
const std::locale locale("Chinese_China.936");
_setmbcp(oldMbcp);
If you're interested in more details, please see stod-does-not-work-correctly-with-boostlocale for a more detailed description of how locale works,
In a nutshell the more interesting part for you:
std::stream (stringstream, fstream, cin, cout) has an inner locale-object, which matches the value of the global C++ locale at the moment of the creation of the stream object. As std::in is created long before your code in main is called, it has most probably the classical C locale, no matter what you do afterwards.
you can make sure, that a std::stream object has the desirable locale by invoking std::stream::imbue(std::locale(your_favorit_locale)).
I would like to add the following:
It is almost never a good idea to set the global locale - it might break other parts of the program or third part libraries - you never know.
std::setlocale and locale::global do slightly different things, but locale::global resets not only the global c++-locale but also the c-locale (which is also set by std::setlocale, not to be confused with the classical "C" locale), so you should call it in another order if you want to have c++ locale set to Chinese_China and C locale to chinese-simplified
First
locale::global(locale("Chinese_China"));
And than
setlocale(LC_ALL, "Chinese-simplified");
Try locale::global(locale("Chinese_China.936")); or locale::global(locale(""));
And for LC_ALL "chinese-simplified" or "chs"
If using Vladislav's answer does not solve this, take a look at answer to stl - Shift-JIS decoding fails using wifstrem in Visual C++ 2013 - Stack Overflow:
const int oldMbcp = _getmbcp();
_setmbcp(936);
const std::locale locale("Chinese_China.936");
_setmbcp(oldMbcp);
There appears to be a bug in Visual Studio's implementation of locales. See also c++ - double byte character sequence conversion issue in Visual Studio 2015 - Stack Overflow.

How to implement diacritics in c++?

I need a help getting words from a .txt file which also contains diacritics. (So there are words containing ěščř etc. Btw that's czech diacritics if that helps.)
My function gets words I type, but it won't get words I type in console containing diacritics.
I think I have to set something in my Microsoft Visual c++ 2010 but I'm not sure what and where. In case I'm wrong, there's the function.
bool find(char typedword[50])
{
bool found = false;
char * word = new char [50];
fstream dictionary;
dictionary.open("Dictionary.txt", ios::in);
while (dictionary >> word)
{
if (strcmp(typedword, word) == 0)
{
found = true;
break;
}
}
dictionary.close();
if (found == true)
return true;
else
return false;
}
Thank you for all your help!
You need locale support, so that sequences of combining characters and the composite equivalent compare equal.
The portable way is setlocale and use strcoll instead of strcmp.
The Windows way is to use CompareStringEx (which automatically uses OS locale settings) instead of strcmp. NormalizeString may also be helpful.