How to output unicode characters in C/C++ [duplicate] - c++

This question already has answers here:
Output unicode strings in Windows console app
(16 answers)
Closed 9 years ago.
I have problems with outputing unicode characters in Windows console.
I am using Windows XP and Code Blocks 12.11 with mingw32-g++ compiler.
What is the proper way to output unicode characters in Windows console with C or C++?
This is my C++ code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
cout << "šđč枊ĐČĆŽ" << endl; // doesn't work
string s = "šđč枊ĐČĆŽ";
cout << s << endl; // doesn't work
return 0;
}
Thanks in advance. :)

Most of those characters take more than a byte to encode, but std::cout's currently imbued locale will only output ASCII characters. For that reason you're probably seeing a lot of weird symbols or question marks in the output stream. You should imbue std::wcout with a locale that uses UTF-8 since these characters are not supported by ASCII:
// <locale> is required for this code.
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;
For Windows systems you will need the following code:
#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>
int main()
{
_setmode(_fileno(stdout), _O_WTEXT);
std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;
return 0;
}

Related

C++: How to enter Unicode character and return it?

As an exercise before starting Unicode-aware programming, I simply want to enter a Unicode character and return it. However, everything beyond 7-bit ASCII is swallowed (I tried q, ä and ѿ). Did I forget to include something? Is wchar_t not the correct type?
#include <iostream>
#include <stdio.h>
#include <fcntl.h>
#include <io.h>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
wchar_t zeichen;
wcin >> zeichen;
wcout << zeichen << endl;
return 0;
}
You need to set Unicode mode for stdin the same as it is configured for stdout:
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);

C++, If...Else with strings and special characters

I was trying to create a simple program to help my students to train German irregular verbs, but I have had problems with special characters and If statement. Basically it does not recognise the ä ö ü and ß, and output is therefore the Else statement (¨Nicht Gut"). How could I fix it?
#include <iostream>
#include <conio.h>
#include <locale.h>
using namespace std;
int main () {
setlocale(LC_CTYPE, "German");
string Antwort1;
string Antwort2;
string Antwort3;
getline(cin, str);
cout << str;
cout << "Präsens von BEHALTEN (du)" << endl;
cin >> Antwort1;
if (Antwort1 == "behältst") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
cout << "Präsens von BEHALTEN (er/sie/es/man) " << endl;
cin >> Antwort1;
if (Antwort1 == "behält") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
return 0;
}
I tried with
if (Antwort1 == (LC_CTYPE, "German"),"behält")
but then it causes the contrary effect. Then every single string I write is valid ("Gut").
My answer applies to the Windows 10 console using the classic default Command Prompt (I haven't tried it with other systems like PowerShell, nor I have tried these experiments on Linux yet).
It seems to me that, as of today (23 February 2022), Windows 10's Command Prompt and the Microsoft C/C++ Runtime of VS2019 don't support Unicode UTF-8 well: See, for example, this blog post showing a CRT crash you get when trying to call:
_setmode(_fileno(stdout), _O_U8TEXT);
and printing UTF-8 text using std::cout.
In my experience, you can make Unicode work in Windows Command Prompt using Unicode UTF-16. You can still use UTF-8 in your C++ application, but you have to convert between UTF-8 and UTF-16 at the Windows boundaries.
I modified your code to use Unicode UTF-16, and the code seems to work correctly when compiled with Visual Studio 2019, and executed inside the Windows Command Prompt:
// Used for _setmode calls
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
// Console I/O with Unicode UTF-16 wcin, wcout and wstring
#include <iostream>
#include <string>
using std::wcin;
using std::wcout;
using std::wstring;
int main() {
// Enable Unicode UTF-16 console input/output
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wcout << L"Präsens von BEHALTEN (du) \n";
wstring Antwort1;
wcin >> Antwort1;
if (Antwort1 == L"behältst") {
wcout << L"Gut! \n";
} else {
wcout << L"Nicht Gut \n";
}
}
Note the use of L"..." to represent UTF-16 string literals, and the use of wchar_t-based std::wcout, std::wcin, and std::wstring instead of the char-based std::cout, std::cin and std::string.

How to read in Hebrew letters in Visual Studios C++

I'm trying to read in a Hebrew text file, do some modifications, then send it to another text file. I've been successful in displaying Hebrew letters using UTF8, but I can't seem to read them in. This code successfully prints out the Hebrew letters to the txt file that it was redirected to, but when I try to read in Hebrew from another text file (that was redirected in) I get random garbage. How do I fix this?
#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>
#include <fstream>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U8TEXT);
wprintf(L"\x05D0");//works with courier new in terminal
wchar_t example[] = L"א";
wcout << endl << example << endl;
wstring x;
getline(wcin, x);
wcout << x;
return 0;
}
Output
א
א
×××× ×©××ת ×× × ×שר×× ××××× ×צר××× ×ת ××¢×§× ××ש ××××ª× ××× â¬
The problem has been figured out. It was what Barmak Shemirani said almost. I put in _setmode(_fileno(stdin), _O_U16TEXT); and changed my output to U16 and then still got garbage then I changed them both to U8 and I was able to read in and out perfectly.

type in special characters and save in file [duplicate]

This question already has answers here:
Output unicode strings in Windows console app
(16 answers)
Closed 4 years ago.
I want to use a plain text file to save input from a user to read it out later In the application it is possible that there occur German and Spanish special signs. Unfortunately the code below doesn't save the signs properly in the file. How is it best to solve this issue?
I solved this issue in a similar application in C by saving in a .bin file instead of a .txt but isnt there a better solution in c++?
#include <string>
#include <fstream>
#include <iostream>
int main() {
std::string s = "abcöäüÑ"; //(alt + 165 for Ñ)
std::ofstream ofs{ "test.txt" };
ofs << s <<'\n'; //this works fine "abcöäüÑ"
std::string s2;
std::cin >> s2; //typeing in the app abcöäüÑ (alt+165 for Ñ)
// i use a windows system with a german keyboard setting
ofs << s2 <<'\n'; //this doesn't it gets "abc”„¥"
}
I use a windows 7 64 system with visual studio 2017 with a german keyboard setting.
The easiest solution (which is deprecated) is to use ANSI code page for German:
setlocale(LC_ALL, "gr");
cout << "abcöäüÑ\n";
ofstream fout("ansi.txt");
fout << "abcöäüÑ\n";
This will work for some limited character sets, it's relatively safe if you stick to Western Latin characters. Maybe this is what you have done with your C code. It doesn't have much to do with saving the file in binary or text.
In Windows it is recommended to use Unicode with wide string functions. Example:
#include <iostream>
#include <string>
#include <fstream>
#include <codecvt>
#include <io.h>
#include <fcntl.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);//wcout instead of cout
_setmode(_fileno(stdin), _O_U16TEXT); //wcin instead of cin
std::locale loc_utf16(std::locale(), new std::codecvt_utf16<wchar_t>);
std::wofstream fout(L"utf16.txt", std::ios::binary);
if(fout)
{
fout.imbue(loc_utf16);
fout << L'\xFEFF'; //insert optional BOM for UTF16
fout << L"abcöäüÑ ελληνική\r\n";
fout.close();
}
std::wifstream fin(L"utf16.txt", std::ios::binary);
if(fin)
{
fin.imbue(loc_utf16);
fin.seekg(2, std::ios::beg); //skip optional BOM if it were added
std::wstring ws;
while(std::getline(fin, ws)) std::wcout << ws << std::endl;
fin.close();
}
return 0;
}
The disadvantage with UTF16 is that programs on systems like Linux may have a hard time with this format. Some people will save the file in UTF8 format, because UTF8 is more familiar with other systems.
Windows itself is UTF16 based. So you have to read and display the input in UTF16. But you can read/write the file in UTF8 format. Example:
std::wcout << "enter:";
std::wstring sw;
std::wcin >> sw;
std::locale loc_utf8(std::locale(), new std::codecvt_utf8<wchar_t>);
std::wofstream fout(L"utf8.txt", std::ios::binary);
if(fout)
{
fout.imbue(loc_utf8);
fout << sw << L"\r\n";
fout << L"abcöäüÑ ελληνική\r\n";
}
Here the binary flag does matter.

C++ setlocale for any region

I've written a little program for testing purposes because when using cout, the German letters ü ö ä ß were not displayed as they should but rather rubbish was given out on the console. However, using these lines
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "German");
cout << "üüü ööö äää ßßß" << '\n';
system("pause");
return 0;
}
have solved this problem, at least as far as the German letters go. When I tried the same for Russian, i. e.
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "RUSSIAN");
cout << "Кирилица" << '\n';
system("pause");
return 0;
}
this doesn't work anymore. What is going wrong here? Am I missing something about the setlocale function? My goal is to adapt the respective program to the writing system that is used, for example Cyrillic like aboe or Chinese or whatever.
FOR GERMAN -> std::setlocale(LC_ALL, "de_DE");
FOR RUSSIAN -> std::setlocale(LC_ALL, "rus");