setw and accentuated characters

setw and accentuated characters - c++

I want to display a "pretty" list of countries and their ISO currency codes on C++.
The issue is that my data is in French and it has accentuated characters. That means that Algeria, actually is written "Algérie" and Sweden becomes "Suède".
map<string, string> currencies; /* ISO code for currency and country name: CAD/Canada */
for (auto &cursor: countries)
cout << setw(15) << left << cursor.second << right << " - " cursor.first << endl;
If the map contains Algeria, Canada and Sweden the result comes out something like this:
Canada - CAD
Algérie - DZD
Suède - SEK
Do you see how Algeria and Sweden are not "pretty"? That's because even though "Algérie" has 7 visible characters and "Suède" has 5, they "count" as one more for setw. The "é" in "Algérie" and the "è" in "Suède" "count" as two characters, because they are "special accentuated characters.
Is there an elegant and simple way to make sure that DZD and SEK get automatically aligned with CAD?

Convert to using std::wstring instead of std::string
Convert to using wide string constants (L"stuff" vs "stuff")
Convert to using std::wcout instead of std::cout
Use setlocale to set a UTF-8 locale
Use wcout.imbue to configure wcout for a UTF-8 locale
Example:
#include <map>
#include <string>
#include <iostream>
#include <iomanip>
#include <locale>
int main() {
setlocale(LC_ALL, "en_US.utf8");
std::locale loc("en_US.UTF-8");
std::wcout.imbue(loc);
std::map<std::wstring, std::wstring> dict
{ {L"Canada",L"CAD"}, {L"Algérie",L"DZD"}, {L"Suède",L"SEK"} };
for (const auto& [key, value]: dict) {
std::wcout << std::setw(10) << key << L" = " << value << std::endl;
}
}

Related

C++ To use correctly cout wcout ifstream to read text file with accented characters

I want to read a file txt encoded UTF8 on Ubuntu 20.04.
I added a locale fr_FR.UTF-8 to wcout cout and ifstream.
I was thinking just to add locale was enough.
output gives this :
Here my code
...
#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <vector>
#include <string>
#include <locale>
#include <codecvt>
int main(int argc, char** argv){
int nbreLigne;
std::vector<std::wstring> dico;
std::string path("liste_test.txt");
std::wstring ligne;
std::locale loc("fr_FR.UTF-8");
std::cout.imbue(loc);
std::wcout.imbue(loc);
std::wifstream file(path.c_str(), std::ios::in);
file.imbue(loc);
std::cout << "Path = " << path << std::endl;
std::cout << "1- locale wifstream : " << file.getloc().name() << std::endl;
std::cout << "2- locale wcout : " << std::wcout.getloc().name() << std::endl;
std::cout << "3- locale cout : " << std::cout.getloc().name() << std::endl;
/* Pas d'erreur de compile mais ne semble pas avoir d'effet
file.imbue(std::locale(file.getloc(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::consume_header>));
std::cout << "1- Variable de localisation : " << file.getloc().name();
*/
if (file){
//compte les lignes
while (std::getline(file, ligne)){
nbreLigne++;
dico.push_back(ligne);
/*
Erreur de segmentation (core dumped) si cette ligne est activée
std::wcout << dico[nbreLigne] << std::endl;
*/
}
std::cout << "Total lines number = " << nbreLigne << std::endl;
}
else{
std::cout << "ERREUR: Impossible d'ouvrir le fichier." << std::endl;
}
std::cout << "-------------------" << std::endl;
std::cout << "Lecture de la variable dico" << std::endl;
std::cout << std::endl;
for(int i = 0; i < nbreLigne; i++){
std::wcout << dico[i] << std::endl;
}
...
How to use correctly locale with cout, wcout and ifstream ?

To simply answer, there's no universal way to handle accented character in C++ and most programming languages. Only ASCII is almost universal and only cover english characters. With time, multiple solutions to handle language specific characters has born from Windows code page to UTF-8 trough wide-char (std::wcout is for those wide-chars).
Your problem is not your program (unless that it should only use std::cout in this case) but your dictionary that use a different character-set than the one of your terminal.
Solving character-set problems is hard and boring. In your case, rewriting the dictionary by hand in UTF-8 or using iconv(1) is worth. In real projects, you will use an internationalization (i18n) tool like GNU gettext that will handle this burden for you. Today, most modern systems use UTF-8.
Also, setting the locale on a C++ stream only change how the program format values like the decimal separator when you << a float, it cannot change the console locale because std::cin is not necessarily a terminal and can be very weird.

isalpha() function lets through Latin letters although the locale is set to Russian [duplicate]

so I need to see if my character is a letter. I tried using isalpha() function, however, if I try to pass not latin letter (for example ą, č, ę, ė, į, š, ų, ū, ž) I get an error, that seems to state that isalpha function accepts only chars that in ASCII code are between 0 and 255. Is there any way to overcome this problem?

You can use a locale version of std::isalpha. Taking an example from the linked reference:
#include <iostream>
#include <locale>
int main()
{
const wchar_t c = L'\u042f'; // cyrillic capital letter ya
std::locale loc1("C");
std::cout << "isalpha('Я', C locale) returned "
<< std::boolalpha << std::isalpha(c, loc1) << '\n';
std::locale loc2("en_US.UTF8");
std::cout << "isalpha('Я', Unicode locale) returned "
<< std::boolalpha << std::isalpha(c, loc2) << '\n';
}
Output:
isalpha('Я', C locale) returned false
isalpha('Я', Unicode locale) returned true

std::isalpha not recognizes utf8? [duplicate]

so I need to see if my character is a letter. I tried using isalpha() function, however, if I try to pass not latin letter (for example ą, č, ę, ė, į, š, ų, ū, ž) I get an error, that seems to state that isalpha function accepts only chars that in ASCII code are between 0 and 255. Is there any way to overcome this problem?

You can use a locale version of std::isalpha. Taking an example from the linked reference:
#include <iostream>
#include <locale>
int main()
{
const wchar_t c = L'\u042f'; // cyrillic capital letter ya
std::locale loc1("C");
std::cout << "isalpha('Я', C locale) returned "
<< std::boolalpha << std::isalpha(c, loc1) << '\n';
std::locale loc2("en_US.UTF8");
std::cout << "isalpha('Я', Unicode locale) returned "
<< std::boolalpha << std::isalpha(c, loc2) << '\n';
}
Output:
isalpha('Я', C locale) returned false
isalpha('Я', Unicode locale) returned true

print to output at specific location on line in c++

I am trying to print to a uniform position in a line after printing out a header. here's an example:
PHRASE TYPE
"hello there" => greeting
"yo" => greeting
"where are you?" => question
"It's 3:00" => statement
"Wow!" => exclamation
Assume each of these are stored in a std::map<string, string>, where key = phrase and value = type. My issue is that simply using tabs is dependent on the console or text editor that I view the output in. If the tab width is too small I won't know for sure where it will be printed. I have tried using setw, but that only prints the separator ("=>") a fixed distance from the end of the phrase. Is there a simple way to do this?
NOTE Assume for now that we just always know that the phrase will not be longer than, say, 16 characters. We don't need to account for what to do if it is.

Use std::left and std::setw:
std::cout << std::left; // This is "sticky", but setw is not.
std::cout << std::setw(16) << phrase << " => " << type << "\n";
For example:
#include <iostream>
#include <string>
#include <iomanip>
#include <map>
int main()
{
std::map<std::string, std::string> m;
m["hello there"] = "greeting";
m["where are you?"] = "question";
std::cout << std::left;
for (std::map<std::string, std::string>::iterator i = m.begin();
i != m.end();
i++)
{
std::cout << std::setw(16)
<< std::string("\"" + i->first + "\"")
<< " => "
<< i->second
<< "\n";
}
return 0;
}
Output:
"hello there" => greeting
"where are you?" => question
See http://ideone.com/JTv6na for demo.

printf("\"%s\"%*c => %s",
it->first.c_str(),
std::max(0, 16 - it->first.size()),
' ',
it->second.c_str());`
The same idea as Peter's solution, but puts the padding outside the quotes. It uses %c with a length argument to insert padding.

If you're not adverse to C-style printing, printf is great for this sort of thing, and much more readable:
printf("\"%16s\" => %s\n", it->first.c_str(), it->second.c_str());
There's nothing wrong with using printf and friends in a C++ program, just be careful mixing iostreams and stdio. You can always sprintf into a buffer, then output that with iostreams.

You might find this function useful:
#include <iostream>
#include <iomanip>
void printRightPadded(std::ostream &stream,const std::string &s,size_t width)
{
std::ios::fmtflags old_flags = stream.setf(std::ios::left);
stream << std::setw(width) << s;
stream.flags(old_flags);
}
You could use it like this:
void
printKeyAndValue(
std::ostream &stream,
const std::string &key,
const std::string &value
)
{
printRightPadded(stream,"\"" + key + "\"",18);
stream << " => " << value << "\n";
}

if you can't work out with setw, a simple alternative to try is to patch all phrase with spaces so that they are all 16 characters long.

I personally find the C-style printing more readable for formatted printing. Using printf you could also handle the column width using the * formatter.
#include <cstdio>
int main() {
printf("%-*s%-*s\n", 10, "Hello", 10, "World");
printf("%-*s%-*s\n", 15, "Hello", 15, "World");
// in the above, '-' left aligns the field
// '*' denotes that the field width is a parameter specified later
// ensure that the width is specified before what it is used to print
}
Output
Hello World
Hello World

Get QString length (in characters, not bytes)

I need to get the actual character count (not byte count), similar to what is output when getting string length in V8.
This is necessary for use with Twitter, which goes by character count no matter the language used, even with UTF-8 (it does NOT go by byte length).
Ex:
in chrome/chromium js console, or in nodejs:
> "Schöne Grüße".length
< 12
In Qt 4.8.2, trying QString someStr = "Schöne Grüße"; cout << someStr.length() will output 15, which is not what I'm aiming for.

If you really want to count grapheme clusters (i.e. the user-perceived characters) instead of code units, you need QTextBoundaryFinder. Here's an example of use:
#include <iostream>
#include <QTextBoundaryFinder>
#include <QString>
int main()
{
const QString s=QString::fromUtf8(u8"abc\U00010139def\U00010102g");
std::cout << "String: \"" << s.toStdString() << "\"\n";
std::cout << "Code unit count : " << s.length() << "\n";
QTextBoundaryFinder tbf(QTextBoundaryFinder::Grapheme, s);
int count=0;
while(tbf.toNextBoundary()!=-1)
++count;
std::cout << "Grapheme cluster count: " << count << "\n";
}
Output:
String: "abc𐄹def𐄂g"
Code unit count : 11
Grapheme cluster count: 9

I believe you need to construct it using the specific fromUtf8 static method:
QString s = QString::fromUtf8("Schöne Grüße");

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

setw and accentuated characters - c++

Related

C++ To use correctly cout wcout ifstream to read text file with accented characters

isalpha() function lets through Latin letters although the locale is set to Russian [duplicate]

std::isalpha not recognizes utf8? [duplicate]

print to output at specific location on line in c++

Get QString length (in characters, not bytes)

Categories

Resources