The following code:
#include <iostream>
using std::wcin;
using std::wcout;
using std::locale;
int main()
{
locale::global(locale("Portuguese_Brazil"));
wcout << "wcin Test using \"ção\": "; // shows that wcout works properly
wchar_t wcinTest[] = L"";
wcin >> wcinTest;
wcout << wcinTest << " should be \"ção\".";
return 0;
}
Results in:
wcin Test using "ção": ção
╬Æo should be "ção".
The ╬ character is U+2021 or 8225, and the ç is U+00E7 or 231.
I changed mult-bytes option, set and not set UNICODE in project properties. Nothing worked.
I already set the console font into Consolas, a true type font capable of displaying the ç character correctly.
I'd like this as simple and reproducible possible to use as a standard practice for future UNICODE console applications.
Any ideas?
wcinTest is a wchar_t buffer of length 1;
You overflow it when you read into it. Use a std::wstring insead.
This finally worked:
#include <iostream>
#include <string>
#include <Windows.h>
using std::cin;
using std::cout;
using std::string;
int main()
{
SetConsoleOutputCP(1252);
SetConsoleCP(1252);
cout << "wcin Test using \"ção\": "; // shows that wcout works properly
string wcinTest;
cin >> wcinTest;
cout << wcinTest << " should be \"ção\".";
return 0;
}
I'm too newbie to understand why I need both SetConsoleOutputCP and SetConsoleCP. I though maybe just SetConsoleCP would fix everything, but no, I need both: SetConsoleOutputCP fixed cout; and SetConsoleCP fixed cin.
Thanks anyway #StoryTeller
Related
I was trying to create a simple program to help my students to train German irregular verbs, but I have had problems with special characters and If statement. Basically it does not recognise the ä ö ü and ß, and output is therefore the Else statement (¨Nicht Gut"). How could I fix it?
#include <iostream>
#include <conio.h>
#include <locale.h>
using namespace std;
int main () {
setlocale(LC_CTYPE, "German");
string Antwort1;
string Antwort2;
string Antwort3;
getline(cin, str);
cout << str;
cout << "Präsens von BEHALTEN (du)" << endl;
cin >> Antwort1;
if (Antwort1 == "behältst") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
cout << "Präsens von BEHALTEN (er/sie/es/man) " << endl;
cin >> Antwort1;
if (Antwort1 == "behält") {
cout << "Gut!" << endl;
}
else {
cout << "Nicht Gut" << endl;
}
return 0;
}
I tried with
if (Antwort1 == (LC_CTYPE, "German"),"behält")
but then it causes the contrary effect. Then every single string I write is valid ("Gut").
My answer applies to the Windows 10 console using the classic default Command Prompt (I haven't tried it with other systems like PowerShell, nor I have tried these experiments on Linux yet).
It seems to me that, as of today (23 February 2022), Windows 10's Command Prompt and the Microsoft C/C++ Runtime of VS2019 don't support Unicode UTF-8 well: See, for example, this blog post showing a CRT crash you get when trying to call:
_setmode(_fileno(stdout), _O_U8TEXT);
and printing UTF-8 text using std::cout.
In my experience, you can make Unicode work in Windows Command Prompt using Unicode UTF-16. You can still use UTF-8 in your C++ application, but you have to convert between UTF-8 and UTF-16 at the Windows boundaries.
I modified your code to use Unicode UTF-16, and the code seems to work correctly when compiled with Visual Studio 2019, and executed inside the Windows Command Prompt:
// Used for _setmode calls
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
// Console I/O with Unicode UTF-16 wcin, wcout and wstring
#include <iostream>
#include <string>
using std::wcin;
using std::wcout;
using std::wstring;
int main() {
// Enable Unicode UTF-16 console input/output
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wcout << L"Präsens von BEHALTEN (du) \n";
wstring Antwort1;
wcin >> Antwort1;
if (Antwort1 == L"behältst") {
wcout << L"Gut! \n";
} else {
wcout << L"Nicht Gut \n";
}
}
Note the use of L"..." to represent UTF-16 string literals, and the use of wchar_t-based std::wcout, std::wcin, and std::wstring instead of the char-based std::cout, std::cin and std::string.
I'm trying to read in a Hebrew text file, do some modifications, then send it to another text file. I've been successful in displaying Hebrew letters using UTF8, but I can't seem to read them in. This code successfully prints out the Hebrew letters to the txt file that it was redirected to, but when I try to read in Hebrew from another text file (that was redirected in) I get random garbage. How do I fix this?
#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>
#include <fstream>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U8TEXT);
wprintf(L"\x05D0");//works with courier new in terminal
wchar_t example[] = L"א";
wcout << endl << example << endl;
wstring x;
getline(wcin, x);
wcout << x;
return 0;
}
Output
א
א
×××× ×©××ת ×× × ×שר×× ××××× ×צר××× ×ת ××¢×§× ××ש ××××ª× ××× â¬
The problem has been figured out. It was what Barmak Shemirani said almost. I put in _setmode(_fileno(stdin), _O_U16TEXT); and changed my output to U16 and then still got garbage then I changed them both to U8 and I was able to read in and out perfectly.
I've written a little program for testing purposes because when using cout, the German letters ü ö ä ß were not displayed as they should but rather rubbish was given out on the console. However, using these lines
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "German");
cout << "üüü ööö äää ßßß" << '\n';
system("pause");
return 0;
}
have solved this problem, at least as far as the German letters go. When I tried the same for Russian, i. e.
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "RUSSIAN");
cout << "Кирилица" << '\n';
system("pause");
return 0;
}
this doesn't work anymore. What is going wrong here? Am I missing something about the setlocale function? My goal is to adapt the respective program to the writing system that is used, for example Cyrillic like aboe or Chinese or whatever.
FOR GERMAN -> std::setlocale(LC_ALL, "de_DE");
FOR RUSSIAN -> std::setlocale(LC_ALL, "rus");
On linux with g++, if I set a utf8 global locale, then wcin correctly transcodes UTF-8 to the internal wchar_t encoding.
However, if I use the classic locale and imbue an UTF8 locale into wcin, this doesn't happen. Input either fails altogether, or each individual byte gets converted to wchar_t independently.
With clang++ and libc++, neither setting the global locale nor imbuing the locale in wcin work.
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
if(true)
// this works with g++, but not with clang++/libc++
locale::global(locale("C.UTF-8"));
else
// this doesn't work with either implementation
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
cout << s.length() << " " << (s == L"áéú");
return 0;
}
The input stream contains only áéú characters. (They are in UTF-8, not any single-byte encoding).
Live demo: one two (I can't reproduce the other behaviour with online compilers).
Is this standard-conforming? Shouldn't I be able to leave the global locale alone and use imbue instead?
Should either of the described behaviours be classified as an implementation bug?
First of all you should use wcout with wcin.
Now you have two possible solutions to that:
1) Deactivate synchronization of iostream and cstdio streams by using
ios_base::sync_with_stdio(false);
Note, that this should be the first call, otherwise the behavior depends on implementation.
int main() {
ios_base::sync_with_stdio(false);
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
2) Localize both locale and wcout:
int main() {
std::setlocale(LC_ALL, "C.UTF-8");
wcout.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
Tested both of them using ideone, works fine. I don't have clang++/libc++ with me, so wasn't able to test this behavior, sorry.
I'm using Qt/C++ on a Linux system. I need to convert a QLineEdit's text to std::wstring and write it into a std::wofstream. It works correctly for ascii strings, but when I enter any other character (Arabic or Uzbek) there is nothing written in the file. (size of file is 0 bytes).
this is my code:
wofstream customersFile;
customersFile.open("./customers.txt");
std::wstring ws = lne_address_customer->text().toStdWString();
customersFile << ws << ws.length() << std::endl;
Output for John Smith entered in the line edit is John Smith10. but for unicode strings, nothing.
First I thought that is a problem with QString::toStdWString(), but customersFile << ws.length(); writes correct length of all strings. So I guess I'm doing something wrong wrong with writing wstring in file. [?]
EDIT:
I write it again in eclipse. and compiled it with g++4.5. result is same:
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
cout << "" << endl; // prints
wstring ws = L"سلام"; // this is an Arabic "Hello"
wofstream wf("new.txt");
if (!wf.bad())
wf << ws;
else
cerr << "some problem";
return 0;
}
Add
#include <locale>
and at the start of main,
std::locale::global(std::locale(""));