How to display Vietnamese characters in C++? - c++

So I'm currently learning C++11 and I'm getting the hang of it. I want to play around with using a different language and since I'm Vietnamese, I want to make a C++ program that uses Vietnamese characters.
So how can I display Vietnamese characters the same way that English is displayed, which is like this:
cout << "Hello. This is English" << endl; //English
cout << "Chào. Đây là tiếng Việt." << endl; //Vietnamese
I heard that C++ has <locale>. Does it help make the Vietnamese characters appear?

You may be running into a problem with your environment. You don't say what platform/environment you are running in, but take the following program:
#include <iostream>
#include <cstdlib>
int main()
{
std::cout << u8"Chào thế giới!" << std::endl;
return EXIT_SUCCESS;
}
This yields the following output from iTerm on Mac OS X:
Chào thế giới!
With other (non-unicode) environments, using the same code, you may get UTF-8 characters interpreted as ASCII on output. I don't know what the Windows command line will yield, but if you are using an IDE, your IDE may or may not render UTF-8, independently of whether your shell does or doesn't.
Here's a web example.
https://code.sololearn.com/c39N9RN6b4Md/#cpp yields:
Chào thế giới!
But http://ideone.com/OkkUZs running exactly the same code yields:
Chào thế giới!
It's probably also worth pointing out that in C++ to properly process UTF-8 strings, count "characters", ensure your strings are valid UTF-8, etc. you will likely want to use a Unicode library--working with Unicode is non-trivial.
Personally, I have found both UTFCPP and TinyUTF8 to be excellent libraries - reasonably small, simple and effective.
Hope that helps.

#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Chào. Đây là tiếng Việt.";
}
This is a solution that works for windows. Unfortunately it's not portable to other platforms.

Related

Using UTF-16 for I/O with Visual Studio instead of code pages

I have this working on Visual Studio 2019 using code pages:
#include <windows.h>
#include <iostream>
int main()
{
UINT oldcp = GetConsoleOutputCP();
SetConsoleOutputCP(932); //932 = Japanese.
//1200 for little-, 1201 big-, endian UTF-16
DWORD used;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),L"私の犬\n", 4,&used, 0);
std::cout << "Hit enter to end."; std::cin.get();
SetConsoleOutputCP(oldcp);
return 0;
}
But I am seeing from Microsoft that I should not be using code pages except to interface with legacy code -- use UTF-16 instead. I can find code pages for UTF-16 (little endian or big endian), but using them doesn't work and it's still using code pages.
So what can I use that accomplishes what my program does, but is up-to-date?
Set stdin and stdout to wide mode in Windows and use wcout and wcin with wide strings. You'll need to switch to a console font to support the characters and and IME to type them as well, which can be accomplished by installing the appropriate language support. You're getting that switch automatically by setting a code page, but the characters output correctly even in the "wrong" code page. If you select a font that supports the characters it will work.
#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_WTEXT);
std::wcout << L"私の犬" << std::endl;
std::wstring a;
std::wcout << L"Type a string: ";
std::getline(std::wcin, a);
std::wcout << a << std::endl;
getwchar();
}
Output (terminal using code page 437 but NSimSun font):
私の犬
Type a string: 马克
马克
Technically every character encoding is a code page. To use UTF-16 you still have to specify the UTF-16 "code page". But you also need to _setmode first
Output unicode strings in Windows console app
How do I print Unicode to the output console in C with Visual Studio?
_setmode(_fileno(stdout), _O_U16TEXT);
std::cout << L"私の犬\n";
But is it up-to-date? No!!! The most reasonable way to print Unicode is to use the UTF-8 code page which will make your app cross-platform and is easier to maintain. See What is the Windows equivalent for en_US.UTF-8 locale? for details on this. Basically just
target Windows SDK v17134 or newer, or use static linking to work on older Windows versions
change the code page to UTF-8
use the -A Win32 APIs instead of -W ones if you're calling those directly (recommended by MS for portability, as everyone else was using UTF-8 for decades)
set the /execution-charset:utf-8 and/or /utf-8 flags while compiling
std::setlocale(LC_ALL, ".UTF8");
std::cout << "私の犬\n";
See also Is it possible to set "locale" of a Windows application to UTF-8?

printing to windows console degrees (°) and cube symbol (³)

I'm working on a c++ windows console program and I need to print degrees (°) and cube symbol (³).
There's tons of info on the ° and the only way that worked for me was:
cout << value << "\370 C" << endl;
Now, what terminology is this? I need the same thing for ³.
I've read somewhere that \370 is octal code, but I can not find any relevant chart with it mentioned that way, or with any equivalent for ³.
You can try something like
cout << value << (char)176 << " C" << endl;
with the number which is parsed to char being the decimal representation of an ascii sign.
³ should be 0xB3 in hexadecimal, 179 in decimal
For more, wacht this.
Make your life easier and use Unicode. Using Unicode you don't need to explicitly encode non-ASCII characters, you just include them in your source as-is. This also makes your program independent from the code page of the console, which could be different in another country.
Steps needed:
Save your source file in a Unicode encoding (UTF-8 works well, UTF-16 works too but some version control software have issues with the latter).
At the beginning of your program, call _setmode(_fileno(stdout), _O_U16TEXT) once. This switches the standard output to UTF-16 encoding. UTF-16 is the preferred Windows encoding as the OS uses it internally, so no conversion overhead will occur.
Use std::wcout instead of std::cout everywhere. Never mix both.
Always use wide (UTF-16) string literals via the L prefix.
Make sure that a console font is selected that actually includes these symbols (very likely as these are quite common).
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
// Switch stdout encoding to UTF-16.
_setmode(_fileno(stdout), _O_U16TEXT);
// Output UTF-16 string literal.
std::wcout << L"°³" << std::endl;
}

Cannot wcout wide string other than English [duplicate]

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.
Here's a sample code straight from cplusplus.com:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.
In GNU/Linux it's as simple as saving the file as UTF-8.
In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.
Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?
The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:
_setmode(_fileno(stdout), _O_U16TEXT);
And use wcout instead of cout.
There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar
More on console support for unicode can be found in this blog
The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.
For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.
While Windows console windows are UCS-2 based, they don't support UTF-8 properly.
You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.
Otherwise, you can use the Unicode console window functions.
Cheers & hth.,
#include <stdio.h>
int main (int argc, char *argv[])
{
// do chcp 65001 in the console before running this
printf ("γασσο γεο!\n");
}
Works perfectly if you do chcp 65001 in the console before running your program.
Caveats:
I'm using 64 bit Windows 7 with VC++ Express 2010
The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
The console has a TrueType font - this is important
Don't know if these things make too much difference...
Can't speak for chars off the BMP, give it a whirl and leave a comment.
Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.
I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.
#include <iostream>
#include <fnctl.h>
#include <io.h>
#include <tchar.h>
<code ommitted>
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << _T("This is some text to print\n");
I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.
std::wcout << _T("This is some text to print") << std::endl;

Is it possible to cout an EM DASH on Linux and Windows? [duplicate]

This question already has answers here:
Output Unicode to console Using C++, in Windows
(5 answers)
Closed 7 years ago.
I haven't been able to find a way to cout a '—' character, whether I put that in the cout statement like this: cout << "—"; or use char(151), the program prints out a fuzzy undefined character. Do you guys see anything wrong with my code? Is couting a EM DASH even possible?
Edit: I've also tried wcout << L"—"; and std::wcout << wchar_t(0x2014);. Those both print nothing in my terminal.
First of all, EM DASH is an unicode character (just making sure you do know that).
Printing unicode characters depends on what you're printing to.
If you're printing to a Unix terminal (or an emulator), the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do what you just did above in your source code cout << "—";
If you're getting fuzzy undefined characters, it is possible that your terminal just doesn't support that character.
If you're in windows (where it is harder), you can do something like this (which is not portable):
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"—";
}
There's no universal support for Unicode in C++ and in various terminals, so there won't be a portable solution.
The thing is that the Windows console uses codepages in console by default. It probably uses UTF-16 internally but will always convert to and from the current ANSI codepage when interacting with outside. So simply printing an UTF-16 code point like std::wcout << wchar_t(0x2014); won't work without any prior setup. You need to switch to UTF-8 by running chcp 65001 in the console or _setmode(_fileno(stdout), _O_U16TEXT); in code before printing the character out with
std::wcout << L"—";
It will not always work because of the worse Unicode support in Windows console. In many cases the characters don't appear due issues in the renderer or font, replacing with squares or ????. But in that case just copy the text out and paste to any Unicode text box then it will be displayed properly
If you're using Windows in English or some other Western European languages that use codepage 1252/ISO-8859-1 then you can print em-dash which is at the codepoint 151 simply by
cout << (char)151;
If it doesn't work then you're not on codepage 1252. You can change it to 1252 if possible or look up for em-dash in your codepage (if available)
On Linux things are much simpler because UTF-8 are used by default. So you can output the string as normal without resorting to std::wcout
std::cout << "—"; // need to make sure that std::string is in UTF-8
// or use std::cout << u8"—" to force the encoding
In fact you'll often get surprise results if you use wide strings on Linux. std::wcout << L"—" won't often work because of some possible bugs in libc
That said, Windows 10 console now supports UTF-8 perfectly and even allows to use UTF-8 as the locale so if you don't need to support Windows 7 then there's a universal method to print any Unicode strings:
std::cout << u8"—";

C++: output contents of a Unicode file to console in Windows

I've read a bunch of articles and forums posts discussing this problem all of the solutions seem way too complicated for such a simple task.
Here's a sample code straight from cplusplus.com:
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
It works fine as long as example.txt has only ASCII characters. Things get messy if I try to add, say, something in Russian.
In GNU/Linux it's as simple as saving the file as UTF-8.
In Windows, that doesn't work. Converting the file into UCS-2 Little Endian (what Windows seems to use by default) and changing all the functions into their wchar_t counterparts doesn't do the trick either.
Isn't there some kind of a "correct" way to get this done without doing all kinds of magic encoding conversions?
The Windows console supports unicode, sort of. It does not support left-to-right and "complex scripts". To print a UTF-16 file with Visual C++, use the following:
_setmode(_fileno(stdout), _O_U16TEXT);
And use wcout instead of cout.
There is no support for a "UTF8" code page so for UTF-8 you will have to use MultiBytetoWideChar
More on console support for unicode can be found in this blog
The right way to output to a console on Windows using cout is to first call GetConsoleOutputCP, and then convert the input you have into the console code page. Alternatively, use WriteConsoleW, passing a wchar_t*.
For reading UTF-8 or UTF-16 strings from a file, you can use the extended mode string of _wfopen_s and fgetws. I don't think there is a C++ interface for these extensions yet. The easiest way to print to the console is described in Michael Kaplan's blog:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
Avoid GetConsoleOutputCP, it is only retained for compatibility with the 8-bit API.
While Windows console windows are UCS-2 based, they don't support UTF-8 properly.
You might make things work by setting the console window's active output code page to UTF-8 temporarily, using the appropriate API functions. Note that those functions distinguish between input code page and output code page. However, [cmd.exe] really doesn't like UTF-8 as active code page, so don't set that as a permanent code page.
Otherwise, you can use the Unicode console window functions.
Cheers & hth.,
#include <stdio.h>
int main (int argc, char *argv[])
{
// do chcp 65001 in the console before running this
printf ("γασσο γεο!\n");
}
Works perfectly if you do chcp 65001 in the console before running your program.
Caveats:
I'm using 64 bit Windows 7 with VC++ Express 2010
The code is in a file encoded as UTF-8 without BOM - I wrote it in a text editor, not using the VC++ IDE, then used VC++ to compile it.
The console has a TrueType font - this is important
Don't know if these things make too much difference...
Can't speak for chars off the BMP, give it a whirl and leave a comment.
Just to be clear, some here have mentioned UTF8. UTF8 is a multibyte format, which in some documentation is mistakenly referred to as Unicode. Unicode is always just two bytes.
I've used this previously posted solution with Visual Studio 2008. I don't know if if works with later versions of Visual Studio.
#include <iostream>
#include <fnctl.h>
#include <io.h>
#include <tchar.h>
<code ommitted>
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << _T("This is some text to print\n");
I used macros to switch between std::wcout and std::cout, and also to remove the _setmode call for ASCII builds, thus allowing compiling either for ASCII and UNICODE. This works. I have not yet tested using std::endl, but I that might work wcout and Unicode (not sure), i.e.
std::wcout << _T("This is some text to print") << std::endl;