I've written a little program for testing purposes because when using cout, the German letters ü ö ä ß were not displayed as they should but rather rubbish was given out on the console. However, using these lines
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "German");
cout << "üüü ööö äää ßßß" << '\n';
system("pause");
return 0;
}
have solved this problem, at least as far as the German letters go. When I tried the same for Russian, i. e.
#include <iostream>
#include <locale>
using namespace std;
int main()
{
setlocale(LC_ALL, "RUSSIAN");
cout << "Кирилица" << '\n';
system("pause");
return 0;
}
this doesn't work anymore. What is going wrong here? Am I missing something about the setlocale function? My goal is to adapt the respective program to the writing system that is used, for example Cyrillic like aboe or Chinese or whatever.
FOR GERMAN -> std::setlocale(LC_ALL, "de_DE");
FOR RUSSIAN -> std::setlocale(LC_ALL, "rus");
Related
The bounty expires in 6 days. Answers to this question are eligible for a +250 reputation bounty.
user2401856 is looking for an answer from a reputable source.
I have this simple C++ code that converts uppercase characters to lowercase:
#include <iostream>
#include <fstream>
#include <cwctype>
#include <locale>
#include <string>
int main()
{
std::wstring input_str = L"İiIı";
std::locale loc("tr_TR.UTF-8");
std::wofstream output_file("lowercase_turkish.txt");
output_file.imbue(loc);
for (wchar_t& c : input_str) {
c = std::towlower(c);
}
output_file << input_str << std::endl;
output_file.close();
return 0;
}
When giving the input İiIı I expect the output to be ıiiı but rather I get the incorrect output İiiı
Why is that happening? and how can I solve the problem with minimum changes to the code considering that I use this code to convert uppercase letters to lowercase in more than 10 languages and it works well on all of them except Turkish.
I don't prefer a solution that is very specific to Turkish.
You need to have the Turkish locale used for std::towlower. Otherwise its using the C locale, which is rather ASCII-centric.
#include <clocale>
#include <cwctype>
#include <fstream>
#include <iostream>
#include <locale>
#include <string>
int main() {
std::wstring input_str = L"İiIıÇç";
std::setlocale(LC_ALL, "tr_TR.UTF-8"); // This should impact std::towlower
std::locale loc("tr_TR.UTF-8");
std::wofstream output_file("lowercase_turkish.txt");
output_file.imbue(loc);
for (wchar_t& c : input_str) {
c = std::towlower(c);
}
output_file << input_str << std::endl;
output_file.close();
}
You apply the locale in std::wcout for printing, but not during the conversion to lowercase. std::towlower (which is a C function) uses the C locale.
https://en.cppreference.com/w/cpp/string/wide/towlower
To fix this, do:
for (auto& c : input_str)
c = std::tolower<wchar_t>(c, loc); // defined in <locale>
However, the output would be:
iiiı
even though it should be iiıı.
The reason for this seems to be that, as you can see here, turkish uses the regular latin I but the turkish-specific ı, and so the conversion of I to lowercase is wrong (regular ascii characters can be mixed with turkish unicode ones).
So you should use a turkish-specific solution:
for (auto& c : input_str)
c = c == L'I' ? L'ı' : std::tolower<wchar_t>(c, loc);
I'm trying to read in a Hebrew text file, do some modifications, then send it to another text file. I've been successful in displaying Hebrew letters using UTF8, but I can't seem to read them in. This code successfully prints out the Hebrew letters to the txt file that it was redirected to, but when I try to read in Hebrew from another text file (that was redirected in) I get random garbage. How do I fix this?
#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>
#include <fstream>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U8TEXT);
wprintf(L"\x05D0");//works with courier new in terminal
wchar_t example[] = L"א";
wcout << endl << example << endl;
wstring x;
getline(wcin, x);
wcout << x;
return 0;
}
Output
א
א
×××× ×©××ת ×× × ×שר×× ××××× ×צר××× ×ת ××¢×§× ××ש ××××ª× ××× â¬
The problem has been figured out. It was what Barmak Shemirani said almost. I put in _setmode(_fileno(stdin), _O_U16TEXT); and changed my output to U16 and then still got garbage then I changed them both to U8 and I was able to read in and out perfectly.
I am trying to print some "special" characters (above 127) in the console, but somehow it doesn't get printed.
I have this small code snippet:
#include <iostream>
int main()
{
std::wcout << L"a■■■■■■■■■■■a■■■■■■■■■■■■■■a" << std::flush;
return 0;
}
it prints the 'a' but then... nothing. and it doesn't matter if I use cout/string or wcout/wstring. (with cout I only see "?" and in wcout nothing, it ends the stream)
The ascii code is 254 for this character. What can be happening here? I thought this is okay to print?
From the answer I linked to in the comments, I think this is your solution:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
This question already has answers here:
Output unicode strings in Windows console app
(16 answers)
Closed 9 years ago.
I have problems with outputing unicode characters in Windows console.
I am using Windows XP and Code Blocks 12.11 with mingw32-g++ compiler.
What is the proper way to output unicode characters in Windows console with C or C++?
This is my C++ code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
cout << "šđč枊ĐČĆŽ" << endl; // doesn't work
string s = "šđč枊ĐČĆŽ";
cout << s << endl; // doesn't work
return 0;
}
Thanks in advance. :)
Most of those characters take more than a byte to encode, but std::cout's currently imbued locale will only output ASCII characters. For that reason you're probably seeing a lot of weird symbols or question marks in the output stream. You should imbue std::wcout with a locale that uses UTF-8 since these characters are not supported by ASCII:
// <locale> is required for this code.
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;
For Windows systems you will need the following code:
#include <iostream>
#include <string>
#include <fcntl.h>
#include <io.h>
int main()
{
_setmode(_fileno(stdout), _O_WTEXT);
std::wstring s = L"šđč枊ĐČĆŽ";
std::wcout << s;
return 0;
}
The following code:
#include <iostream>
using std::wcin;
using std::wcout;
using std::locale;
int main()
{
locale::global(locale("Portuguese_Brazil"));
wcout << "wcin Test using \"ção\": "; // shows that wcout works properly
wchar_t wcinTest[] = L"";
wcin >> wcinTest;
wcout << wcinTest << " should be \"ção\".";
return 0;
}
Results in:
wcin Test using "ção": ção
╬Æo should be "ção".
The ╬ character is U+2021 or 8225, and the ç is U+00E7 or 231.
I changed mult-bytes option, set and not set UNICODE in project properties. Nothing worked.
I already set the console font into Consolas, a true type font capable of displaying the ç character correctly.
I'd like this as simple and reproducible possible to use as a standard practice for future UNICODE console applications.
Any ideas?
wcinTest is a wchar_t buffer of length 1;
You overflow it when you read into it. Use a std::wstring insead.
This finally worked:
#include <iostream>
#include <string>
#include <Windows.h>
using std::cin;
using std::cout;
using std::string;
int main()
{
SetConsoleOutputCP(1252);
SetConsoleCP(1252);
cout << "wcin Test using \"ção\": "; // shows that wcout works properly
string wcinTest;
cin >> wcinTest;
cout << wcinTest << " should be \"ção\".";
return 0;
}
I'm too newbie to understand why I need both SetConsoleOutputCP and SetConsoleCP. I though maybe just SetConsoleCP would fix everything, but no, I need both: SetConsoleOutputCP fixed cout; and SetConsoleCP fixed cin.
Thanks anyway #StoryTeller