C++ How to check if letter isalpha (not latin alphabet) - c++

so I need to see if my character is a letter. I tried using isalpha() function, however, if I try to pass not latin letter (for example ą, č, ę, ė, į, š, ų, ū, ž) I get an error, that seems to state that isalpha function accepts only chars that in ASCII code are between 0 and 255. Is there any way to overcome this problem?

You can use a locale version of std::isalpha. Taking an example from the linked reference:
#include <iostream>
#include <locale>
int main()
{
const wchar_t c = L'\u042f'; // cyrillic capital letter ya
std::locale loc1("C");
std::cout << "isalpha('Я​', C locale) returned "
<< std::boolalpha << std::isalpha(c, loc1) << '\n';
std::locale loc2("en_US.UTF8");
std::cout << "isalpha('Я', Unicode locale) returned "
<< std::boolalpha << std::isalpha(c, loc2) << '\n';
}
Output:
isalpha('Я​', C locale) returned false
isalpha('Я', Unicode locale) returned true

Related

setw and accentuated characters

I want to display a "pretty" list of countries and their ISO currency codes on C++.
The issue is that my data is in French and it has accentuated characters. That means that Algeria, actually is written "Algérie" and Sweden becomes "Suède".
map<string, string> currencies; /* ISO code for currency and country name: CAD/Canada */
for (auto &cursor: countries)
cout << setw(15) << left << cursor.second << right << " - " cursor.first << endl;
If the map contains Algeria, Canada and Sweden the result comes out something like this:
Canada - CAD
Algérie - DZD
Suède - SEK
Do you see how Algeria and Sweden are not "pretty"? That's because even though "Algérie" has 7 visible characters and "Suède" has 5, they "count" as one more for setw. The "é" in "Algérie" and the "è" in "Suède" "count" as two characters, because they are "special accentuated characters.
Is there an elegant and simple way to make sure that DZD and SEK get automatically aligned with CAD?
Convert to using std::wstring instead of std::string
Convert to using wide string constants (L"stuff" vs "stuff")
Convert to using std::wcout instead of std::cout
Use setlocale to set a UTF-8 locale
Use wcout.imbue to configure wcout for a UTF-8 locale
Example:
#include <map>
#include <string>
#include <iostream>
#include <iomanip>
#include <locale>
int main() {
setlocale(LC_ALL, "en_US.utf8");
std::locale loc("en_US.UTF-8");
std::wcout.imbue(loc);
std::map<std::wstring, std::wstring> dict
{ {L"Canada",L"CAD"}, {L"Algérie",L"DZD"}, {L"Suède",L"SEK"} };
for (const auto& [key, value]: dict) {
std::wcout << std::setw(10) << key << L" = " << value << std::endl;
}
}

Character matches neither space nor non-space - whats going on?

It seems that I can match higher order unicode on a char by char basis. But classes/properties does not work well.
I created this sample program (on Windows):
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::wregex re(L"\\S+");
std::wstring c(L"c");
std::wstring imp_smiley(L"\U0001F608");//gets encoded as UTF16
std::cout << std::boolalpha << "c: " << std::regex_match(c, re) << std::endl;
std::cout << std::boolalpha << "imp_smiley: " << std::regex_match(imp_smiley, re) << std::endl;
}
What is weird is that the 'imp smiley' char matches neither \S (non white space) nor \s (white space). I would have expected this to be treated as non white-space.
What is going on ?
Update
If using [^\s]+ instead of eg. \\S+ will actually makes it match. It seems that utf-16 is not being recognized (or normalized).

isalpha() function lets through Latin letters although the locale is set to Russian [duplicate]

so I need to see if my character is a letter. I tried using isalpha() function, however, if I try to pass not latin letter (for example ą, č, ę, ė, į, š, ų, ū, ž) I get an error, that seems to state that isalpha function accepts only chars that in ASCII code are between 0 and 255. Is there any way to overcome this problem?
You can use a locale version of std::isalpha. Taking an example from the linked reference:
#include <iostream>
#include <locale>
int main()
{
const wchar_t c = L'\u042f'; // cyrillic capital letter ya
std::locale loc1("C");
std::cout << "isalpha('Я​', C locale) returned "
<< std::boolalpha << std::isalpha(c, loc1) << '\n';
std::locale loc2("en_US.UTF8");
std::cout << "isalpha('Я', Unicode locale) returned "
<< std::boolalpha << std::isalpha(c, loc2) << '\n';
}
Output:
isalpha('Я​', C locale) returned false
isalpha('Я', Unicode locale) returned true

std::isalpha not recognizes utf8? [duplicate]

so I need to see if my character is a letter. I tried using isalpha() function, however, if I try to pass not latin letter (for example ą, č, ę, ė, į, š, ų, ū, ž) I get an error, that seems to state that isalpha function accepts only chars that in ASCII code are between 0 and 255. Is there any way to overcome this problem?
You can use a locale version of std::isalpha. Taking an example from the linked reference:
#include <iostream>
#include <locale>
int main()
{
const wchar_t c = L'\u042f'; // cyrillic capital letter ya
std::locale loc1("C");
std::cout << "isalpha('Я​', C locale) returned "
<< std::boolalpha << std::isalpha(c, loc1) << '\n';
std::locale loc2("en_US.UTF8");
std::cout << "isalpha('Я', Unicode locale) returned "
<< std::boolalpha << std::isalpha(c, loc2) << '\n';
}
Output:
isalpha('Я​', C locale) returned false
isalpha('Я', Unicode locale) returned true

Where do we really need to use wide character stream wcout?

I can't get the point of using std::wcout. As far as I've understood the wide stream object corresponds to a C wide-oriented stream (ISO C 7.19.2/5). When do we really need to use it in practice. I'm pretty sure it doesn't suit to output a character from an implementation's wide character set N3797::3.9.1/5 [basic.fundamental], because
#include <iostream>
#include <locale>
int main()
{
std::locale loc = std::locale ("en_US.UTF-8");
std::wcout.imbue(loc);
std::cout << "ي" << std::endl; // OK!
std::wcout << "ي" << std::endl; // Empty string
std::wcout << "L specifier = " << L"ي" << std::endl; // J
std::wcout << "u specifier = " << u"ي" << std::endl; // 0x400,eac
std::wcout << "u8 specifier = " << u8"ي" << std::endl; // empty string
}
DEMO
We can see that wcout's operator<< didn't print these characters correct, meanwhile cout's operator<< does it well. I've also chech wcout on any other charater like
'л', 'ਚੰ', 'ਗਾ', 'კ', 'ა', 'რ', 'გ', 'ი' and so on and so forth, but it prints well only a Latinic characters or a numbers.