I'm trying to learn Unicode programming in Windows.
I have this simple program:
#include <iostream>
#include <string>
int main()
{
std::wstring greekWord = L"Ελληνικά";
std::wcout << greekWord << std::endl;
return 0;
}
However, it outputs nothing. Any ideas how to make it output Greek?
I tried adding non-Greek letters, and that didn't work quite right either.
The first thing to try is to make the program not dependent on the encoding of the source file. So use Unicode escapes not literal Unicode letters
std::wstring greekWord = L"\u0395\u03BB\u03BB\u03B7\u03BD\u03B9\u03BA\u03AC";
Having the incorrect encoding in the source file is only one thing of many things that could be preventing you from printing Greek. The other obvious issue is the ability of your terminal to print Greek letters. If it can't do that, or needs to be set up correctly so that it can then nothing you do in your program is going to work.
And probably you want to fix the source code encoding issue, so that you can use unescaped literals in your code. But that's dependent on the compiler/IDE you are using.
If you are outputting your cout to a normal console then the console doesn't usually support unicode text like greek, try setting it up for unicode text or find another way to output your data, like txt files or some gui,
There are two way to do this.
The old, non-standard Microsoft way is as follows:
#include <fcntl.h>
#include <io.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_WTEXT);
// your code here
}
You will fild this everywhere, but this is not necessarily a good way to solve this problem.
The more standards-compliant way is as follows:
#include <locale>
int main()
{
std::locale l(""); // or std::locale l("en_US.utf-8");
std::locale::global(l); // or std::wcout.imbue(l); std::wcin.imbue(l);
// your code here
}
This should work with other modern compilers and operating systems too.
TRY this it works with me :
#include
#include <io.h>
#include <fcntl.h>
using namespace std;
int main() {
_setmode(_fileno(stdout),_O_U16TEXT);
wcout<<L"Ελληνικά";
setlocale(LC_ALL,"");
return 0;
}
Related
I have tried many things but they don't seem to work.
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "en-US.utf8");
std::string uni = u8"ユチエフー8";
std::cout << uni;
std::cin.get();
return 0;
}
Please note that I want utf-8/utf-32 to work not in just this situation but most of the situations without using things like wide characters.
Here a little code that reads a line from UFT-8 file:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
When I build it with the Visual Studio Visual C++ compiler, I got the next result:
test τεστ тест
as expected.
By when I use MinGW with the GCC compiler, I got
琀攀猀琀 쐃딃쌃쐃 䈄㔄䄄䈄
As you understand, it's not the expected result.
Does any simple way exist to fix the output for GCC to the expected string?
OR
Does any simple way exist to use UTF-8 for both MSVC and GCC?
Answer (thanks for Igor Tandetnik and Remy Lebeau):
Seems, we must specify endian mode explicitly, because MSVC and GCC have different defaults. So
new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>()
should be used.
Fixed code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
For your second question, one option is to limit the use of utf16 and std::w-prefixed stuff to the cases when you need to exchange utf16-encoded strings with the operating system. This happens when you receive arguments in wmain, open file with _wfopen, call Windows API function, etc. Otherwise, you would store, get from the user and return to the user utf8 strings using char type (char*, std::string, etc). Conversion between utf8 and utf16 can be done with MultiByteToWideChar and WideCharToMultiByte, bypassing the retarded c++ encoding api. The place where this does not work well is console input/output. Overall, you can output utf8 to the console if the user sets chcp 65001 and a ttf font. At least in Windows 7, you will also have to make sure not to split a character between two write calls, otherwise it will not print correctly (this also implies you cannot use std::cout, because msvcrt will call putc for every byte separately, and you'll need to use puts, fprintf, etc instead); I heard that this was fixed in Windows 10, but cannot confirm. Reading utf8 from the console with file api does not work as far as I know; if you want that, you'd need to detect that stdin is attached to a console and use console api instead.
So thing is I can copy paste unicode characters like chess pieces directly to terminal( I'm using debian jessie linux) but whenever I write c++ code to do that, I get these � instead
here is my code
enter code here
#include<iostream>
#include<algorithm>
#include<cmath>
#include<cstdio>
using namespace std;
int main()
{
setlocale(LC_ALL,"");
wchar_t piece='♗';
wcout<<piece;
}
I tried to use the hex or decimal code of the characters but it does not work
I also use vim to edit and it does show the characters while I'm typing.
There's no specification of what encoding should be used for wchar_t. I need to use mbstowcs function to convert that character. Like this, for example:
#include <iostream>
#include <clocale>
#include <cstdlib>
using namespace std;
int main(void) {
setlocale(LC_ALL, "");
wchar_t piece;
mbstowcs(&piece, "♗", 1);
wcout << piece << endl;
return 0;
}
assuming your source file encoding matches the encoding of your locale.
Oddly enough what worked was going at it normally and putting the special character into a string it's so ridiculously simple I didn't even think to use it.
#include<iostream>
using namespace std;
int main()
{
string piece="♗";
cout<<piece;
}
I'm writing program in c++ that is supposed to change letters in text to uppercase letters(Program works, but setlocale is not working). But it is giving me Error. [Error] 'LC_TYPE' was not declared in this scope. It "should" work because it is from my official faculty literature.
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << "Write something: " << endl;
string tekst; //tekst=text
getline(cin, tekst);
setlocale(LC_TYPE, "croatian"); // here is problem...
for (char znak : tekst){ //znak=char, symbol...
char velikoSlovo = toupper(znak); // velikoSlovo=uppercaseLetter
cout << velikoSlovo;
}
cout << endl;
return 0;
}
Anyone knows how to fix this??
I'm using Orwell Dev C++ 5.9.2. Language standard (-std) is ISO C++ 11.
Here is picture.
Don't you need to include #include <clocale> as it is said here
Edit:
Actually #include <locale.h> should be preferred to <clocale> to reduce portability issues. Thanks to #Cheers for mentioning it in the comments.
As the documentation says, you should use LC_CTYPE:
http://www.cplusplus.com/reference/clocale/setlocale/
Also you need to provide valid locale code, as described here
So for chroatian, your line should look like:
setlocale(LC_CTYPE, 'hr_HR.UTF-8'); // or just "hr_HR"
or you can just use:
setlocale(LC_ALL, "")
which should set the localization to default locale used by your computer.
And as suggested before, you may also need to add #include <clocale>
Add
#include <locale.h>
to get a declaration of setlocale and its associated constants.
In general just read the relevant documentation of whatever function or associated things the compiler doesn't seem to recognize.
In general the chosen approach may work with single-byte oriented encodings, but not with multibyte encodings such as UTF-8. Single-byte encodings are commonly used in Windows. In Unix-land it's UTF-8 that rules.
And for Windows, you generally want the system's default locale, so instead of
setlocale(LC_TYPE, "croatian");
you may (but is not necessarily, but may with a good chance) be better served by
setlocale(LC_ALL, "");
where the empty string selects the system locale rather than the default pure ASCII "C" locale.
Also, note that toupper from the C library requires a non-negative argument, or else the special value EOF. You can just cast the argument to unsigned char, when you know that it's not EOF.
Picking up C++ and having a go at it on OS X 10.9 using XCode 5.0.2 and using Alex Allain as reference.
The following code compiles just fine and outputs correctly
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string user_first_name = "test";
std::cout << user_first_name << "\n";
return 0;
}
When I add a getline function, code appears to compile but with no output.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string user_first_name = "test";
std::getline( std::cin, user_first_name, '\n' );
std::cout << user_first_name << "\n";
return 0;
}
In fact debug navigator has memory filling up with bars (although actual memory use is fixed at 276 KB). Why am I getting stumped on such a simple thing/concept?
I did a bit of digging around and its quite likely this is related to a text encoding issue. I'm using defaults which is Unicode (UTF-8). Encoding is not something I'm familiar with, never something I had to deal with when learning on Windows. How do I get past this?
I can't comment regarding the use of XCode or OS X, but it was my understanding that std::cin always gives you a narrow (single-byte) character stream. In Windows (at least with Visual Studio), I think it works whether you compile for UTF8 (single-byte for all ASCII characters) or UTF16 (2-bytes for all ASCII characters). The runtime library presumably does the conversion for you as necessary.
I'm not sure what "filling up with bars" means, but maybe it's just that you're looking at uninitialized memory. If you think that it is an encoding issue, perhaps try using wstring/wcin instead of string/cin and see if that helps.