Output unicode strings in Windows console app - c++

Hi I was trying to output unicode string to a console with iostreams and failed.
I found this: Using unicode font in c++ console app and this snippet works.
SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);
However, I did not find any way to output unicode correctly with iostreams. Any suggestions?
This does not work:
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;
EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.
//Unicode output for a Windows console
ostream &operator-(ostream &stream, const wchar_t *s)
{
int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char *buf = new char[bufSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
wprintf(L"%S", buf);
delete[] buf;
return stream;
}
ostream &operator-(ostream &stream, const wstring &s)
{
stream - s.c_str();
return stream;
}

I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).
Solution:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}
Screenshot:

Unicode Hello World in Chinese
Here is a Hello World in Chinese. Actually it is just "Hello". I tested this on Windows 10, but I think it might work since Windows Vista.
Before Windows Vista it will be hard, if you want a programmatic solution, instead of configuring the console / registry etc. Maybe have a look here if you really need to do this on Windows 7: Change console Font Windows 7
I dont want to claim this is the only solution, but this is what worked for me.
Outline
Unicode project setup
Set the console codepage to unicode
Find and use a font that supports the characters you want to display
Use the locale of the language you want to display
Use the wide character output i.e. std::wcout
1 Project Setup
I am using Visual Studio 2017 CE. I created a blank console app. The default settings are alright. But if you experience problems or you use a different ide you might want to check these:
In your project properties find configuration properties -> General -> Project Defaults -> Character Set. It should be "Use Unicode Character Set" not "Multi-Byte".
This will define _UNICODE and UNICODE preprocessor macros for you.
int wmain(int argc, wchar_t* argv[])
Also I think we should use wmain function instead of main. They both work, but in a unicode environment wmain may be more convenient.
Also my source files are UTF-16-LE encoded, which seems to be the default in Visual Studio 2017.
2 Console Codepage
This is quite obvious. We need the unicode codepage in the console.
If you want to check your default codepage, just open a console and type chcp withou any arguments.
We have to change it to 65001, which is the UTF-8 codepage. Windows Codepage Identifiers
There is a preprocessor macro for that codepage: CP_UTF8.
I needed to set both, the input and output codepage. When I omitted either one, the output was incorrect.
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
You might also want to check the boolean return values of those functions.
3 Choose a Font
Until yet I didnt find a console font that supports every character. So I had to choose one.
If you want to output characters which are partly only available in one font and partly in another font, then I believe it is impossible to find a solution. Only maybe if there is a font out there that supports every character. But also I didnt look into how to install a font.
I think it is not possible to use two different fonts in the same console window at the same time.
How to find a compatible font?
Open your console, go to the properties of the console window by clicking on the icon in the upper left of the window. Go to the fonts tab and choose a font and click ok. Then try to enter your characters in the console window. Repeat this until you find a font you can work with. Then note down the name of the font.
Also you can change the size of the font in the properties window.
If you found a size you are happy with, note down the size values that are displayed in the properties window in the section "selected font". It will show width and height in pixels.
To actually set the font programmatically you use:
CONSOLE_FONT_INFOEX fontInfo;
// ... configure fontInfo
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
See my example at the end of this answer for details.
Or look it up in the fine manual: SetCurrentConsoleFont.
This function only exists since Windows Vista.
4 Set the locale
You will need to set the locale to the locale of the language which characters you want to print.
char* a = setlocale(LC_ALL, "chinese");
The return value is interesting. It will contain a string to describe exactly wich locale was chosen.
Just give it a try :-)
I tested with chinese and german.
More info: setlocale
5 Use wide character output
Not much to say here.
If you want to output wide characters, use this for example:
std::wcout << L"你好" << std::endl;
Oh, and dont forget the L prefix for wide characters!
And if you type literal unicode characters like this in the source file, the source file must be unicode encoded. Like the default in Visual Studio is UTF-16-LE. Or maybe use notepad++ and set the encoding to UCS-2 LE BOM.
Example
Finally I put it all together as an example:
#include <Windows.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale.h>
#include <wincon.h>
int wmain(int argc, wchar_t* argv[])
{
SetConsoleTitle(L"My Console Window - 你好");
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
char* a = setlocale(LC_ALL, "chinese");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
CONSOLE_FONT_INFOEX fontInfo;
fontInfo.cbSize = sizeof(fontInfo);
fontInfo.FontFamily = 54;
fontInfo.FontWeight = 400;
fontInfo.nFont = 0;
const wchar_t myFont[] = L"KaiTi";
fontInfo.dwFontSize = { 18, 41 };
std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName);
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
std::wcout << L"Hello World!" << std::endl;
std::wcout << L"你好!" << std::endl;
return 0;
}
Cheers !
Edit on 2021-11-20
Maybe you can also try to use the new Windows Terminal. It seems to print unicode out of the box. You will still need to configure a font that supports your characters in the settings. It is developed by Microsoft as OpenSource on github and you can also install it from the Microsoft Store. I successfully tried this on Windows 10.

You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:
#include <fmt/core.h>
int main() {
fmt::print("èéøÞǽлљΣæča");
}
Output:
èéøÞǽлљΣæča
This requires compiling with the /utf-8 compiler option in MSVC.
I don't recommend using wcout because it is non-portable and doesn't even work on Windows without extra efforts, for example:
std::wcout << L"èéøÞǽлљΣæča";
will print:
├и├й├╕├Ю╟╜╨╗╤Щ╬г├ж─Нa
in Russian Windows (ACP 1251, console CP 866).
Disclaimer: I'm the author of {fmt}.

The wcout must have the locale set differently to the CRT. Here's how it can be fixed:
int _tmain(int argc, _TCHAR* argv[])
{
char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
std::locale lollocale(locale);
setlocale(LC_ALL, locale); // Restore the CRT.
std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
std::wcout << L"¡Hola!";
std::cin.get();
return 0;
}
I just tested it, and it displays the string here absolutely fine.

SetConsoleCP() and chcp does not the same!
Take this program snippet:
SetConsoleCP(65001) // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);
The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn't match to CP65001 (= UTF-8).
Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.
Type: chcp
Answer: 850
Type: test.exe
Answer: tr├ñnen├╝berstr├ÂmtÔäó
Type: chcp
Answer: 65001 - This setting has changed by SetConsoleCP() but with no useful effect.
Type: chcp 65001
Type: test.exe
Answer: tränenüberströmt™ - All OK now.
Tested with: German Windows XP SP3

Solution 1: use WCHAR
One thing that always works: use wide char all the places. Like,
const wchar_t* str = L"你好\n";
DWORD nwritten = 0;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, 3, &nwritten, NULL);
Unicode is language neutral. You can use any language and won't have encoding issues. You want use UTF-8? Fine. Use MultiByteToWideChar to convert it wide char string first.
Before continue reading the other solution below, please note this one has a unique advantage: it doesn't depend on the system or the user's locale settings.
Solution 2: Set system locale and user locale properly. And they should be the same.
I assume UTF-8 locale for Windows isn't in the picture yet. Then you need to know which language(Chinese, French?) you would use, and change your system settings to match it. There is system level setting:
And a user level setting:
Please set both of them to the same language.
Then, in your program, insert "setlocale(LC_ALL, "");" to your main function. It's a universal rule, no matter which OS you use, whenever you want to use the standard library to process charsets other than ASCII, you should have this line of code. Otherwise, the locale defaults to "C" and it only contains ASCII. Then you can start to use std::wcout and C functions like fputws.

I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.

Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:
You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
write to console using WriteConsoleW
Look through an interesing article about java unicode on windows console
Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.

First, sorry I probably don't have the fonts required so I cannot test it yet.
Something looks a bit fishy here
// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
// lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption
while
// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;
what about
// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;

There are a few issues with the mswcrt and io streams.
Trick _setmode(_fileno(stdout), _O_U16TEXT); working only for MS VC++ not MinGW-GCC. Moreover sometimes it is brings to crashes depending on Windows configuration.
SetConsoleCP(65001) for UTF-8. May fail in many multibyte character scenarios, but is is always OK for UTF-16LE
You need to restore previews console codepage on application exit.
Windows console supports UNICODE with the ReadConsole and WriteConsole functions in UTF-16LE mode. Background effect - piping in this case will not work. I.e. myapp.exe >> ret.log brings to 0 byte ret.log file. If you are ok with this fact you can try my library as following.
const char* umessage = "Hello!\nПривет!\nПривіт!\nΧαιρετίσματα!\nHelló!\nHallå!\n";
...
#include <console.hpp>
#include <ios>
...
std::ostream& cout = io::console::out_stream();
cout << umessage
<< 1234567890ull << '\n'
<< 123456.78e+09 << '\n'
<< 12356.789e+10L << '\n'
<< std::hex << 0xCAFEBABE
<< std::endl;
Library will auto-convert your UTF-8 into UTF-16LE and write it into console using WriteConsole. As well as there are error and input streams.
Another library benefit - colors.
Link on example app:
https://github.com/incoder1/IO/tree/master/examples/iostreams
The library homepage:
https://github.com/incoder1/IO
Screenshot:

Correctly displaying Western European characters in the windows console
Long story short:
use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
History of the discovery
I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.
The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse's console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.
I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:
Start a console by any one of
Win + R then type cmd and hit the Return key.
Hit the Win key and type cmd followed by the return key.
Open the system menu by any one of
click the upper left corner icon
Hit the Alt + Space key combination
then select "Default" to change the behavior of all subsequent console windows
click the "Font" tab
Select Consolas or Lucida console
Click OK
Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn't work for my Spanish characters.
Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute
chcp 28591
before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.
Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.
Please note that HKCU ("HKEY_CURRENT_USER") is only for the current user. If you want to change it for all users in that computer, you'll need to use the regedit utility and find/create the corresponding Console key (probably you'll have to create a Console key inside HKEY_USERS\.DEFAULT)

Default encoding on:
Windows UTF-16.
Linux UTF-8.
MacOS UTF-8.
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Add Macros to detect Platform.
#if defined (_WIN32)
#define WINDOWSLIB 1
#elif defined (__ANDROID__) || defined(ANDROID)//Android
#define ANDROIDLIB 1
#elif defined (__APPLE__)//iOS, Mac OS
#define MACOSLIB 1
#elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#endif
Create conversion functions std::wstring to std::string or viceversa.
#include <locale>
#include <iostream>
#include <string>
#ifdef WINDOWSLIB
#include <Windows.h>
#endif
using namespace std::literals::string_literals;
// Convert std::wstring to std::string
std::string WidestringToString(const std::wstring& wstr, const std::string& locale)
{
if (wstr.empty())
{
return std::string();
}
size_t pos;
size_t begin = 0;
std::string ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
ret.append(converted);
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
// Convert std::string to std::wstring
std::wstring StringToWideString(const std::string& str, const std::string& locale)
{
if (str.empty())
{
return std::wstring();
}
size_t pos;
size_t begin = 0;
std::wstring ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
Print std::string.
Check RawString Suffix.
Linux Code. Print directly std::string using std::cout.
If you have std::wstring.
1. Convert to std::string.
2. Print with std::cout.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
std::cout << "RESULT=" << result << std::endl;
std::cout << "RESULT_SIZE=" << result.size() << std::endl;
On Windows if you need to print unicode. We need to use WriteConsole for print unicode chars from std::wstring or std::string.
void WriteUnicodeLine(const std::string& s)
{
#ifdef WINDOWSLIB
WriteUnicode(s);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << s << std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicode(const std::string& s)
{
#ifdef WINDOWSLIB
std::wstring unicode = Insane::String::Strings::StringToWideString(s);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << s;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeLineW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws)<<std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws);
#elif defined MACOSLIB
#endif
}
Windows Code. Using WriteLineUnicode or WriteUnicode function. Same code can be used for Linux.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
WriteLineUnicode(u8"RESULT" + result);
WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));
Finally on Windows. You need a powerfull and complete support for unicode chars in console.
I recommend ConEmu and set as default terminal on Windows.
Test on Microsoft Visual Studio and Jetbrains Clion.
Tested on Microsoft Visual Studio 2017 with VC++; std=c++17. (Windows Project)
Tested on Microsoft Visual Studio 2017 with g++; std=c++17. (Linux Project)
Tested on Jetbrains Clion 2018.3 with g++; std=c++17. (Linux Toolchain / Remote)
QA
Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. wstring on Windows are interchan.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windows.
Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in all compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windows.

running a console App from VS2017 under Win10 with UK regional settings required me to:
set VS2017 tools > Environment > Fonts and Colors > Font: 'Lucida' for instance
save C++ source files with encoding "Unicode (UTF-8 with signature) - Codepage 650001" so that you can type-in accented character litterals L"âéïôù" without compiler warnings, yet avoid double byte characters everywhere
compile with Configuration Properties > General > CharacterSet > "Use Multi-byte.." and the Configuration Properties > C/C++ > All Options > Additional Options > "/utf-8" flag
#include <iostream>, <io.h>, and <fcntl.h>
execute an obscure '_setmode(_fileno(stdout), _O_WTEXT);' once at the start of the App
forget 'cout <<... ;' and use only 'wcout << ... ;'
For memo, VS2015 on Win7 required a 'SetConsoleOutputCP(65001);' and allowed to mix outputs via wcout and cout.

If you were looking for a portable solution, which is unfortunately still not part of a standard as of C++20, I can recommend the nowide library. It comes either standalone or as part of boost. You'll find many standard counterparts consuming or emitting utf-8 encoded chars there. Yes, chars, not char8_ts (yet). Feel free to use char8_t-remediation utilities to interpret char8_ts as chars, if your program sports them already.
The requested code snippet would look like this:
#include <boost/nowide/iostream.hpp>
#include <char8_t-remediation.h>
int main()
{
using boost::nowide::cout;
cout << U8("¡Hola!") << std::endl;
}
Note: Please be aware of streams orientation issue. A short recommendation in the context of my answer would be: Use exclusively nowide streams for input / output and utf-8 encoded data.

In my case I'm reading UTF-8 file and print to Console, I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;

Testing with VS2019 with UNICODE console app on Win10 found the following testing Spanish and Japanese:
If you just wprintf a string then you get the wrong characters for Spanish (Japanese not tested but sure it won't work). It appears the default "C" locale default is ASCII (the traditional extended ASCII collating table for PC's).
using: setlocale(LC_ALL, ""); sets the correct code page to CP1252 when using Spanish (Mexico) Windows language setting and the output is good (lucida console font). However, Japanese output (using Japanese Windows Language) is suppressed (meaning no output for those characters, normal Latin characters are output).
using: '_setmode(_fileno(stdout), _O_U16TEXT);` output works correctly for all. However all output is 16bit so redirection to a file outputs 16bit characters.
using: printf and UTF-8 text output with SetConsoleOutputCP(CP_UTF8) also works (but not if you set it after setlocale(LC_ALL, ""); - I had to remove that for the output to work).
Fonts: For the Asian characters use MS Mincho, for the others you can use Lucida Console.

Related

Why do I always get Ř instead of č? (C++, Visual Studio) [duplicate]

Hi I was trying to output unicode string to a console with iostreams and failed.
I found this: Using unicode font in c++ console app and this snippet works.
SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);
However, I did not find any way to output unicode correctly with iostreams. Any suggestions?
This does not work:
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;
EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.
//Unicode output for a Windows console
ostream &operator-(ostream &stream, const wchar_t *s)
{
int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char *buf = new char[bufSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
wprintf(L"%S", buf);
delete[] buf;
return stream;
}
ostream &operator-(ostream &stream, const wstring &s)
{
stream - s.c_str();
return stream;
}
I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).
Solution:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}
Screenshot:
Unicode Hello World in Chinese
Here is a Hello World in Chinese. Actually it is just "Hello". I tested this on Windows 10, but I think it might work since Windows Vista.
Before Windows Vista it will be hard, if you want a programmatic solution, instead of configuring the console / registry etc. Maybe have a look here if you really need to do this on Windows 7: Change console Font Windows 7
I dont want to claim this is the only solution, but this is what worked for me.
Outline
Unicode project setup
Set the console codepage to unicode
Find and use a font that supports the characters you want to display
Use the locale of the language you want to display
Use the wide character output i.e. std::wcout
1 Project Setup
I am using Visual Studio 2017 CE. I created a blank console app. The default settings are alright. But if you experience problems or you use a different ide you might want to check these:
In your project properties find configuration properties -> General -> Project Defaults -> Character Set. It should be "Use Unicode Character Set" not "Multi-Byte".
This will define _UNICODE and UNICODE preprocessor macros for you.
int wmain(int argc, wchar_t* argv[])
Also I think we should use wmain function instead of main. They both work, but in a unicode environment wmain may be more convenient.
Also my source files are UTF-16-LE encoded, which seems to be the default in Visual Studio 2017.
2 Console Codepage
This is quite obvious. We need the unicode codepage in the console.
If you want to check your default codepage, just open a console and type chcp withou any arguments.
We have to change it to 65001, which is the UTF-8 codepage. Windows Codepage Identifiers
There is a preprocessor macro for that codepage: CP_UTF8.
I needed to set both, the input and output codepage. When I omitted either one, the output was incorrect.
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
You might also want to check the boolean return values of those functions.
3 Choose a Font
Until yet I didnt find a console font that supports every character. So I had to choose one.
If you want to output characters which are partly only available in one font and partly in another font, then I believe it is impossible to find a solution. Only maybe if there is a font out there that supports every character. But also I didnt look into how to install a font.
I think it is not possible to use two different fonts in the same console window at the same time.
How to find a compatible font?
Open your console, go to the properties of the console window by clicking on the icon in the upper left of the window. Go to the fonts tab and choose a font and click ok. Then try to enter your characters in the console window. Repeat this until you find a font you can work with. Then note down the name of the font.
Also you can change the size of the font in the properties window.
If you found a size you are happy with, note down the size values that are displayed in the properties window in the section "selected font". It will show width and height in pixels.
To actually set the font programmatically you use:
CONSOLE_FONT_INFOEX fontInfo;
// ... configure fontInfo
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
See my example at the end of this answer for details.
Or look it up in the fine manual: SetCurrentConsoleFont.
This function only exists since Windows Vista.
4 Set the locale
You will need to set the locale to the locale of the language which characters you want to print.
char* a = setlocale(LC_ALL, "chinese");
The return value is interesting. It will contain a string to describe exactly wich locale was chosen.
Just give it a try :-)
I tested with chinese and german.
More info: setlocale
5 Use wide character output
Not much to say here.
If you want to output wide characters, use this for example:
std::wcout << L"你好" << std::endl;
Oh, and dont forget the L prefix for wide characters!
And if you type literal unicode characters like this in the source file, the source file must be unicode encoded. Like the default in Visual Studio is UTF-16-LE. Or maybe use notepad++ and set the encoding to UCS-2 LE BOM.
Example
Finally I put it all together as an example:
#include <Windows.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale.h>
#include <wincon.h>
int wmain(int argc, wchar_t* argv[])
{
SetConsoleTitle(L"My Console Window - 你好");
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
char* a = setlocale(LC_ALL, "chinese");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
CONSOLE_FONT_INFOEX fontInfo;
fontInfo.cbSize = sizeof(fontInfo);
fontInfo.FontFamily = 54;
fontInfo.FontWeight = 400;
fontInfo.nFont = 0;
const wchar_t myFont[] = L"KaiTi";
fontInfo.dwFontSize = { 18, 41 };
std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName);
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
std::wcout << L"Hello World!" << std::endl;
std::wcout << L"你好!" << std::endl;
return 0;
}
Cheers !
Edit on 2021-11-20
Maybe you can also try to use the new Windows Terminal. It seems to print unicode out of the box. You will still need to configure a font that supports your characters in the settings. It is developed by Microsoft as OpenSource on github and you can also install it from the Microsoft Store. I successfully tried this on Windows 10.
You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:
#include <fmt/core.h>
int main() {
fmt::print("èéøÞǽлљΣæča");
}
Output:
èéøÞǽлљΣæča
This requires compiling with the /utf-8 compiler option in MSVC.
I don't recommend using wcout because it is non-portable and doesn't even work on Windows without extra efforts, for example:
std::wcout << L"èéøÞǽлљΣæča";
will print:
├и├й├╕├Ю╟╜╨╗╤Щ╬г├ж─Нa
in Russian Windows (ACP 1251, console CP 866).
Disclaimer: I'm the author of {fmt}.
The wcout must have the locale set differently to the CRT. Here's how it can be fixed:
int _tmain(int argc, _TCHAR* argv[])
{
char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
std::locale lollocale(locale);
setlocale(LC_ALL, locale); // Restore the CRT.
std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
std::wcout << L"¡Hola!";
std::cin.get();
return 0;
}
I just tested it, and it displays the string here absolutely fine.
SetConsoleCP() and chcp does not the same!
Take this program snippet:
SetConsoleCP(65001) // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);
The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn't match to CP65001 (= UTF-8).
Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.
Type: chcp
Answer: 850
Type: test.exe
Answer: tr├ñnen├╝berstr├ÂmtÔäó
Type: chcp
Answer: 65001 - This setting has changed by SetConsoleCP() but with no useful effect.
Type: chcp 65001
Type: test.exe
Answer: tränenüberströmt™ - All OK now.
Tested with: German Windows XP SP3
Solution 1: use WCHAR
One thing that always works: use wide char all the places. Like,
const wchar_t* str = L"你好\n";
DWORD nwritten = 0;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, 3, &nwritten, NULL);
Unicode is language neutral. You can use any language and won't have encoding issues. You want use UTF-8? Fine. Use MultiByteToWideChar to convert it wide char string first.
Before continue reading the other solution below, please note this one has a unique advantage: it doesn't depend on the system or the user's locale settings.
Solution 2: Set system locale and user locale properly. And they should be the same.
I assume UTF-8 locale for Windows isn't in the picture yet. Then you need to know which language(Chinese, French?) you would use, and change your system settings to match it. There is system level setting:
And a user level setting:
Please set both of them to the same language.
Then, in your program, insert "setlocale(LC_ALL, "");" to your main function. It's a universal rule, no matter which OS you use, whenever you want to use the standard library to process charsets other than ASCII, you should have this line of code. Otherwise, the locale defaults to "C" and it only contains ASCII. Then you can start to use std::wcout and C functions like fputws.
I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.
Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:
You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
write to console using WriteConsoleW
Look through an interesing article about java unicode on windows console
Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.
First, sorry I probably don't have the fonts required so I cannot test it yet.
Something looks a bit fishy here
// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
// lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption
while
// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;
what about
// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;
There are a few issues with the mswcrt and io streams.
Trick _setmode(_fileno(stdout), _O_U16TEXT); working only for MS VC++ not MinGW-GCC. Moreover sometimes it is brings to crashes depending on Windows configuration.
SetConsoleCP(65001) for UTF-8. May fail in many multibyte character scenarios, but is is always OK for UTF-16LE
You need to restore previews console codepage on application exit.
Windows console supports UNICODE with the ReadConsole and WriteConsole functions in UTF-16LE mode. Background effect - piping in this case will not work. I.e. myapp.exe >> ret.log brings to 0 byte ret.log file. If you are ok with this fact you can try my library as following.
const char* umessage = "Hello!\nПривет!\nПривіт!\nΧαιρετίσματα!\nHelló!\nHallå!\n";
...
#include <console.hpp>
#include <ios>
...
std::ostream& cout = io::console::out_stream();
cout << umessage
<< 1234567890ull << '\n'
<< 123456.78e+09 << '\n'
<< 12356.789e+10L << '\n'
<< std::hex << 0xCAFEBABE
<< std::endl;
Library will auto-convert your UTF-8 into UTF-16LE and write it into console using WriteConsole. As well as there are error and input streams.
Another library benefit - colors.
Link on example app:
https://github.com/incoder1/IO/tree/master/examples/iostreams
The library homepage:
https://github.com/incoder1/IO
Screenshot:
Correctly displaying Western European characters in the windows console
Long story short:
use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
History of the discovery
I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.
The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse's console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.
I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:
Start a console by any one of
Win + R then type cmd and hit the Return key.
Hit the Win key and type cmd followed by the return key.
Open the system menu by any one of
click the upper left corner icon
Hit the Alt + Space key combination
then select "Default" to change the behavior of all subsequent console windows
click the "Font" tab
Select Consolas or Lucida console
Click OK
Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn't work for my Spanish characters.
Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute
chcp 28591
before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.
Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.
Please note that HKCU ("HKEY_CURRENT_USER") is only for the current user. If you want to change it for all users in that computer, you'll need to use the regedit utility and find/create the corresponding Console key (probably you'll have to create a Console key inside HKEY_USERS\.DEFAULT)
Default encoding on:
Windows UTF-16.
Linux UTF-8.
MacOS UTF-8.
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Add Macros to detect Platform.
#if defined (_WIN32)
#define WINDOWSLIB 1
#elif defined (__ANDROID__) || defined(ANDROID)//Android
#define ANDROIDLIB 1
#elif defined (__APPLE__)//iOS, Mac OS
#define MACOSLIB 1
#elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#endif
Create conversion functions std::wstring to std::string or viceversa.
#include <locale>
#include <iostream>
#include <string>
#ifdef WINDOWSLIB
#include <Windows.h>
#endif
using namespace std::literals::string_literals;
// Convert std::wstring to std::string
std::string WidestringToString(const std::wstring& wstr, const std::string& locale)
{
if (wstr.empty())
{
return std::string();
}
size_t pos;
size_t begin = 0;
std::string ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
ret.append(converted);
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
// Convert std::string to std::wstring
std::wstring StringToWideString(const std::string& str, const std::string& locale)
{
if (str.empty())
{
return std::wstring();
}
size_t pos;
size_t begin = 0;
std::wstring ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
Print std::string.
Check RawString Suffix.
Linux Code. Print directly std::string using std::cout.
If you have std::wstring.
1. Convert to std::string.
2. Print with std::cout.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
std::cout << "RESULT=" << result << std::endl;
std::cout << "RESULT_SIZE=" << result.size() << std::endl;
On Windows if you need to print unicode. We need to use WriteConsole for print unicode chars from std::wstring or std::string.
void WriteUnicodeLine(const std::string& s)
{
#ifdef WINDOWSLIB
WriteUnicode(s);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << s << std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicode(const std::string& s)
{
#ifdef WINDOWSLIB
std::wstring unicode = Insane::String::Strings::StringToWideString(s);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << s;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeLineW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws)<<std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws);
#elif defined MACOSLIB
#endif
}
Windows Code. Using WriteLineUnicode or WriteUnicode function. Same code can be used for Linux.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
WriteLineUnicode(u8"RESULT" + result);
WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));
Finally on Windows. You need a powerfull and complete support for unicode chars in console.
I recommend ConEmu and set as default terminal on Windows.
Test on Microsoft Visual Studio and Jetbrains Clion.
Tested on Microsoft Visual Studio 2017 with VC++; std=c++17. (Windows Project)
Tested on Microsoft Visual Studio 2017 with g++; std=c++17. (Linux Project)
Tested on Jetbrains Clion 2018.3 with g++; std=c++17. (Linux Toolchain / Remote)
QA
Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. wstring on Windows are interchan.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windows.
Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in all compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windows.
running a console App from VS2017 under Win10 with UK regional settings required me to:
set VS2017 tools > Environment > Fonts and Colors > Font: 'Lucida' for instance
save C++ source files with encoding "Unicode (UTF-8 with signature) - Codepage 650001" so that you can type-in accented character litterals L"âéïôù" without compiler warnings, yet avoid double byte characters everywhere
compile with Configuration Properties > General > CharacterSet > "Use Multi-byte.." and the Configuration Properties > C/C++ > All Options > Additional Options > "/utf-8" flag
#include <iostream>, <io.h>, and <fcntl.h>
execute an obscure '_setmode(_fileno(stdout), _O_WTEXT);' once at the start of the App
forget 'cout <<... ;' and use only 'wcout << ... ;'
For memo, VS2015 on Win7 required a 'SetConsoleOutputCP(65001);' and allowed to mix outputs via wcout and cout.
If you were looking for a portable solution, which is unfortunately still not part of a standard as of C++20, I can recommend the nowide library. It comes either standalone or as part of boost. You'll find many standard counterparts consuming or emitting utf-8 encoded chars there. Yes, chars, not char8_ts (yet). Feel free to use char8_t-remediation utilities to interpret char8_ts as chars, if your program sports them already.
The requested code snippet would look like this:
#include <boost/nowide/iostream.hpp>
#include <char8_t-remediation.h>
int main()
{
using boost::nowide::cout;
cout << U8("¡Hola!") << std::endl;
}
Note: Please be aware of streams orientation issue. A short recommendation in the context of my answer would be: Use exclusively nowide streams for input / output and utf-8 encoded data.
In my case I'm reading UTF-8 file and print to Console, I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
Testing with VS2019 with UNICODE console app on Win10 found the following testing Spanish and Japanese:
If you just wprintf a string then you get the wrong characters for Spanish (Japanese not tested but sure it won't work). It appears the default "C" locale default is ASCII (the traditional extended ASCII collating table for PC's).
using: setlocale(LC_ALL, ""); sets the correct code page to CP1252 when using Spanish (Mexico) Windows language setting and the output is good (lucida console font). However, Japanese output (using Japanese Windows Language) is suppressed (meaning no output for those characters, normal Latin characters are output).
using: '_setmode(_fileno(stdout), _O_U16TEXT);` output works correctly for all. However all output is 16bit so redirection to a file outputs 16bit characters.
using: printf and UTF-8 text output with SetConsoleOutputCP(CP_UTF8) also works (but not if you set it after setlocale(LC_ALL, ""); - I had to remove that for the output to work).
Fonts: For the Asian characters use MS Mincho, for the others you can use Lucida Console.

Displaying Chinese character in C++ [duplicate]

Hi I was trying to output unicode string to a console with iostreams and failed.
I found this: Using unicode font in c++ console app and this snippet works.
SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);
However, I did not find any way to output unicode correctly with iostreams. Any suggestions?
This does not work:
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;
EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.
//Unicode output for a Windows console
ostream &operator-(ostream &stream, const wchar_t *s)
{
int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char *buf = new char[bufSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
wprintf(L"%S", buf);
delete[] buf;
return stream;
}
ostream &operator-(ostream &stream, const wstring &s)
{
stream - s.c_str();
return stream;
}
I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).
Solution:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}
Screenshot:
Unicode Hello World in Chinese
Here is a Hello World in Chinese. Actually it is just "Hello". I tested this on Windows 10, but I think it might work since Windows Vista.
Before Windows Vista it will be hard, if you want a programmatic solution, instead of configuring the console / registry etc. Maybe have a look here if you really need to do this on Windows 7: Change console Font Windows 7
I dont want to claim this is the only solution, but this is what worked for me.
Outline
Unicode project setup
Set the console codepage to unicode
Find and use a font that supports the characters you want to display
Use the locale of the language you want to display
Use the wide character output i.e. std::wcout
1 Project Setup
I am using Visual Studio 2017 CE. I created a blank console app. The default settings are alright. But if you experience problems or you use a different ide you might want to check these:
In your project properties find configuration properties -> General -> Project Defaults -> Character Set. It should be "Use Unicode Character Set" not "Multi-Byte".
This will define _UNICODE and UNICODE preprocessor macros for you.
int wmain(int argc, wchar_t* argv[])
Also I think we should use wmain function instead of main. They both work, but in a unicode environment wmain may be more convenient.
Also my source files are UTF-16-LE encoded, which seems to be the default in Visual Studio 2017.
2 Console Codepage
This is quite obvious. We need the unicode codepage in the console.
If you want to check your default codepage, just open a console and type chcp withou any arguments.
We have to change it to 65001, which is the UTF-8 codepage. Windows Codepage Identifiers
There is a preprocessor macro for that codepage: CP_UTF8.
I needed to set both, the input and output codepage. When I omitted either one, the output was incorrect.
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
You might also want to check the boolean return values of those functions.
3 Choose a Font
Until yet I didnt find a console font that supports every character. So I had to choose one.
If you want to output characters which are partly only available in one font and partly in another font, then I believe it is impossible to find a solution. Only maybe if there is a font out there that supports every character. But also I didnt look into how to install a font.
I think it is not possible to use two different fonts in the same console window at the same time.
How to find a compatible font?
Open your console, go to the properties of the console window by clicking on the icon in the upper left of the window. Go to the fonts tab and choose a font and click ok. Then try to enter your characters in the console window. Repeat this until you find a font you can work with. Then note down the name of the font.
Also you can change the size of the font in the properties window.
If you found a size you are happy with, note down the size values that are displayed in the properties window in the section "selected font". It will show width and height in pixels.
To actually set the font programmatically you use:
CONSOLE_FONT_INFOEX fontInfo;
// ... configure fontInfo
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
See my example at the end of this answer for details.
Or look it up in the fine manual: SetCurrentConsoleFont.
This function only exists since Windows Vista.
4 Set the locale
You will need to set the locale to the locale of the language which characters you want to print.
char* a = setlocale(LC_ALL, "chinese");
The return value is interesting. It will contain a string to describe exactly wich locale was chosen.
Just give it a try :-)
I tested with chinese and german.
More info: setlocale
5 Use wide character output
Not much to say here.
If you want to output wide characters, use this for example:
std::wcout << L"你好" << std::endl;
Oh, and dont forget the L prefix for wide characters!
And if you type literal unicode characters like this in the source file, the source file must be unicode encoded. Like the default in Visual Studio is UTF-16-LE. Or maybe use notepad++ and set the encoding to UCS-2 LE BOM.
Example
Finally I put it all together as an example:
#include <Windows.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale.h>
#include <wincon.h>
int wmain(int argc, wchar_t* argv[])
{
SetConsoleTitle(L"My Console Window - 你好");
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
char* a = setlocale(LC_ALL, "chinese");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
CONSOLE_FONT_INFOEX fontInfo;
fontInfo.cbSize = sizeof(fontInfo);
fontInfo.FontFamily = 54;
fontInfo.FontWeight = 400;
fontInfo.nFont = 0;
const wchar_t myFont[] = L"KaiTi";
fontInfo.dwFontSize = { 18, 41 };
std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName);
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
std::wcout << L"Hello World!" << std::endl;
std::wcout << L"你好!" << std::endl;
return 0;
}
Cheers !
Edit on 2021-11-20
Maybe you can also try to use the new Windows Terminal. It seems to print unicode out of the box. You will still need to configure a font that supports your characters in the settings. It is developed by Microsoft as OpenSource on github and you can also install it from the Microsoft Store. I successfully tried this on Windows 10.
You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:
#include <fmt/core.h>
int main() {
fmt::print("èéøÞǽлљΣæča");
}
Output:
èéøÞǽлљΣæča
This requires compiling with the /utf-8 compiler option in MSVC.
I don't recommend using wcout because it is non-portable and doesn't even work on Windows without extra efforts, for example:
std::wcout << L"èéøÞǽлљΣæča";
will print:
├и├й├╕├Ю╟╜╨╗╤Щ╬г├ж─Нa
in Russian Windows (ACP 1251, console CP 866).
Disclaimer: I'm the author of {fmt}.
The wcout must have the locale set differently to the CRT. Here's how it can be fixed:
int _tmain(int argc, _TCHAR* argv[])
{
char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
std::locale lollocale(locale);
setlocale(LC_ALL, locale); // Restore the CRT.
std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
std::wcout << L"¡Hola!";
std::cin.get();
return 0;
}
I just tested it, and it displays the string here absolutely fine.
SetConsoleCP() and chcp does not the same!
Take this program snippet:
SetConsoleCP(65001) // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);
The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn't match to CP65001 (= UTF-8).
Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.
Type: chcp
Answer: 850
Type: test.exe
Answer: tr├ñnen├╝berstr├ÂmtÔäó
Type: chcp
Answer: 65001 - This setting has changed by SetConsoleCP() but with no useful effect.
Type: chcp 65001
Type: test.exe
Answer: tränenüberströmt™ - All OK now.
Tested with: German Windows XP SP3
Solution 1: use WCHAR
One thing that always works: use wide char all the places. Like,
const wchar_t* str = L"你好\n";
DWORD nwritten = 0;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, 3, &nwritten, NULL);
Unicode is language neutral. You can use any language and won't have encoding issues. You want use UTF-8? Fine. Use MultiByteToWideChar to convert it wide char string first.
Before continue reading the other solution below, please note this one has a unique advantage: it doesn't depend on the system or the user's locale settings.
Solution 2: Set system locale and user locale properly. And they should be the same.
I assume UTF-8 locale for Windows isn't in the picture yet. Then you need to know which language(Chinese, French?) you would use, and change your system settings to match it. There is system level setting:
And a user level setting:
Please set both of them to the same language.
Then, in your program, insert "setlocale(LC_ALL, "");" to your main function. It's a universal rule, no matter which OS you use, whenever you want to use the standard library to process charsets other than ASCII, you should have this line of code. Otherwise, the locale defaults to "C" and it only contains ASCII. Then you can start to use std::wcout and C functions like fputws.
I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.
Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:
You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
write to console using WriteConsoleW
Look through an interesing article about java unicode on windows console
Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.
First, sorry I probably don't have the fonts required so I cannot test it yet.
Something looks a bit fishy here
// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
// lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption
while
// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;
what about
// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;
There are a few issues with the mswcrt and io streams.
Trick _setmode(_fileno(stdout), _O_U16TEXT); working only for MS VC++ not MinGW-GCC. Moreover sometimes it is brings to crashes depending on Windows configuration.
SetConsoleCP(65001) for UTF-8. May fail in many multibyte character scenarios, but is is always OK for UTF-16LE
You need to restore previews console codepage on application exit.
Windows console supports UNICODE with the ReadConsole and WriteConsole functions in UTF-16LE mode. Background effect - piping in this case will not work. I.e. myapp.exe >> ret.log brings to 0 byte ret.log file. If you are ok with this fact you can try my library as following.
const char* umessage = "Hello!\nПривет!\nПривіт!\nΧαιρετίσματα!\nHelló!\nHallå!\n";
...
#include <console.hpp>
#include <ios>
...
std::ostream& cout = io::console::out_stream();
cout << umessage
<< 1234567890ull << '\n'
<< 123456.78e+09 << '\n'
<< 12356.789e+10L << '\n'
<< std::hex << 0xCAFEBABE
<< std::endl;
Library will auto-convert your UTF-8 into UTF-16LE and write it into console using WriteConsole. As well as there are error and input streams.
Another library benefit - colors.
Link on example app:
https://github.com/incoder1/IO/tree/master/examples/iostreams
The library homepage:
https://github.com/incoder1/IO
Screenshot:
Correctly displaying Western European characters in the windows console
Long story short:
use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
History of the discovery
I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.
The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse's console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.
I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:
Start a console by any one of
Win + R then type cmd and hit the Return key.
Hit the Win key and type cmd followed by the return key.
Open the system menu by any one of
click the upper left corner icon
Hit the Alt + Space key combination
then select "Default" to change the behavior of all subsequent console windows
click the "Font" tab
Select Consolas or Lucida console
Click OK
Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn't work for my Spanish characters.
Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute
chcp 28591
before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.
Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.
Please note that HKCU ("HKEY_CURRENT_USER") is only for the current user. If you want to change it for all users in that computer, you'll need to use the regedit utility and find/create the corresponding Console key (probably you'll have to create a Console key inside HKEY_USERS\.DEFAULT)
Default encoding on:
Windows UTF-16.
Linux UTF-8.
MacOS UTF-8.
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Add Macros to detect Platform.
#if defined (_WIN32)
#define WINDOWSLIB 1
#elif defined (__ANDROID__) || defined(ANDROID)//Android
#define ANDROIDLIB 1
#elif defined (__APPLE__)//iOS, Mac OS
#define MACOSLIB 1
#elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#endif
Create conversion functions std::wstring to std::string or viceversa.
#include <locale>
#include <iostream>
#include <string>
#ifdef WINDOWSLIB
#include <Windows.h>
#endif
using namespace std::literals::string_literals;
// Convert std::wstring to std::string
std::string WidestringToString(const std::wstring& wstr, const std::string& locale)
{
if (wstr.empty())
{
return std::string();
}
size_t pos;
size_t begin = 0;
std::string ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
ret.append(converted);
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
// Convert std::string to std::wstring
std::wstring StringToWideString(const std::string& str, const std::string& locale)
{
if (str.empty())
{
return std::wstring();
}
size_t pos;
size_t begin = 0;
std::wstring ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
Print std::string.
Check RawString Suffix.
Linux Code. Print directly std::string using std::cout.
If you have std::wstring.
1. Convert to std::string.
2. Print with std::cout.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
std::cout << "RESULT=" << result << std::endl;
std::cout << "RESULT_SIZE=" << result.size() << std::endl;
On Windows if you need to print unicode. We need to use WriteConsole for print unicode chars from std::wstring or std::string.
void WriteUnicodeLine(const std::string& s)
{
#ifdef WINDOWSLIB
WriteUnicode(s);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << s << std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicode(const std::string& s)
{
#ifdef WINDOWSLIB
std::wstring unicode = Insane::String::Strings::StringToWideString(s);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << s;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeLineW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws)<<std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws);
#elif defined MACOSLIB
#endif
}
Windows Code. Using WriteLineUnicode or WriteUnicode function. Same code can be used for Linux.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
WriteLineUnicode(u8"RESULT" + result);
WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));
Finally on Windows. You need a powerfull and complete support for unicode chars in console.
I recommend ConEmu and set as default terminal on Windows.
Test on Microsoft Visual Studio and Jetbrains Clion.
Tested on Microsoft Visual Studio 2017 with VC++; std=c++17. (Windows Project)
Tested on Microsoft Visual Studio 2017 with g++; std=c++17. (Linux Project)
Tested on Jetbrains Clion 2018.3 with g++; std=c++17. (Linux Toolchain / Remote)
QA
Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. wstring on Windows are interchan.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windows.
Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in all compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windows.
running a console App from VS2017 under Win10 with UK regional settings required me to:
set VS2017 tools > Environment > Fonts and Colors > Font: 'Lucida' for instance
save C++ source files with encoding "Unicode (UTF-8 with signature) - Codepage 650001" so that you can type-in accented character litterals L"âéïôù" without compiler warnings, yet avoid double byte characters everywhere
compile with Configuration Properties > General > CharacterSet > "Use Multi-byte.." and the Configuration Properties > C/C++ > All Options > Additional Options > "/utf-8" flag
#include <iostream>, <io.h>, and <fcntl.h>
execute an obscure '_setmode(_fileno(stdout), _O_WTEXT);' once at the start of the App
forget 'cout <<... ;' and use only 'wcout << ... ;'
For memo, VS2015 on Win7 required a 'SetConsoleOutputCP(65001);' and allowed to mix outputs via wcout and cout.
If you were looking for a portable solution, which is unfortunately still not part of a standard as of C++20, I can recommend the nowide library. It comes either standalone or as part of boost. You'll find many standard counterparts consuming or emitting utf-8 encoded chars there. Yes, chars, not char8_ts (yet). Feel free to use char8_t-remediation utilities to interpret char8_ts as chars, if your program sports them already.
The requested code snippet would look like this:
#include <boost/nowide/iostream.hpp>
#include <char8_t-remediation.h>
int main()
{
using boost::nowide::cout;
cout << U8("¡Hola!") << std::endl;
}
Note: Please be aware of streams orientation issue. A short recommendation in the context of my answer would be: Use exclusively nowide streams for input / output and utf-8 encoded data.
In my case I'm reading UTF-8 file and print to Console, I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
Testing with VS2019 with UNICODE console app on Win10 found the following testing Spanish and Japanese:
If you just wprintf a string then you get the wrong characters for Spanish (Japanese not tested but sure it won't work). It appears the default "C" locale default is ASCII (the traditional extended ASCII collating table for PC's).
using: setlocale(LC_ALL, ""); sets the correct code page to CP1252 when using Spanish (Mexico) Windows language setting and the output is good (lucida console font). However, Japanese output (using Japanese Windows Language) is suppressed (meaning no output for those characters, normal Latin characters are output).
using: '_setmode(_fileno(stdout), _O_U16TEXT);` output works correctly for all. However all output is 16bit so redirection to a file outputs 16bit characters.
using: printf and UTF-8 text output with SetConsoleOutputCP(CP_UTF8) also works (but not if you set it after setlocale(LC_ALL, ""); - I had to remove that for the output to work).
Fonts: For the Asian characters use MS Mincho, for the others you can use Lucida Console.

Put out unicode symbol [duplicate]

Hi I was trying to output unicode string to a console with iostreams and failed.
I found this: Using unicode font in c++ console app and this snippet works.
SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);
However, I did not find any way to output unicode correctly with iostreams. Any suggestions?
This does not work:
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;
EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.
//Unicode output for a Windows console
ostream &operator-(ostream &stream, const wchar_t *s)
{
int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char *buf = new char[bufSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
wprintf(L"%S", buf);
delete[] buf;
return stream;
}
ostream &operator-(ostream &stream, const wstring &s)
{
stream - s.c_str();
return stream;
}
I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).
Solution:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}
Screenshot:
Unicode Hello World in Chinese
Here is a Hello World in Chinese. Actually it is just "Hello". I tested this on Windows 10, but I think it might work since Windows Vista.
Before Windows Vista it will be hard, if you want a programmatic solution, instead of configuring the console / registry etc. Maybe have a look here if you really need to do this on Windows 7: Change console Font Windows 7
I dont want to claim this is the only solution, but this is what worked for me.
Outline
Unicode project setup
Set the console codepage to unicode
Find and use a font that supports the characters you want to display
Use the locale of the language you want to display
Use the wide character output i.e. std::wcout
1 Project Setup
I am using Visual Studio 2017 CE. I created a blank console app. The default settings are alright. But if you experience problems or you use a different ide you might want to check these:
In your project properties find configuration properties -> General -> Project Defaults -> Character Set. It should be "Use Unicode Character Set" not "Multi-Byte".
This will define _UNICODE and UNICODE preprocessor macros for you.
int wmain(int argc, wchar_t* argv[])
Also I think we should use wmain function instead of main. They both work, but in a unicode environment wmain may be more convenient.
Also my source files are UTF-16-LE encoded, which seems to be the default in Visual Studio 2017.
2 Console Codepage
This is quite obvious. We need the unicode codepage in the console.
If you want to check your default codepage, just open a console and type chcp withou any arguments.
We have to change it to 65001, which is the UTF-8 codepage. Windows Codepage Identifiers
There is a preprocessor macro for that codepage: CP_UTF8.
I needed to set both, the input and output codepage. When I omitted either one, the output was incorrect.
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
You might also want to check the boolean return values of those functions.
3 Choose a Font
Until yet I didnt find a console font that supports every character. So I had to choose one.
If you want to output characters which are partly only available in one font and partly in another font, then I believe it is impossible to find a solution. Only maybe if there is a font out there that supports every character. But also I didnt look into how to install a font.
I think it is not possible to use two different fonts in the same console window at the same time.
How to find a compatible font?
Open your console, go to the properties of the console window by clicking on the icon in the upper left of the window. Go to the fonts tab and choose a font and click ok. Then try to enter your characters in the console window. Repeat this until you find a font you can work with. Then note down the name of the font.
Also you can change the size of the font in the properties window.
If you found a size you are happy with, note down the size values that are displayed in the properties window in the section "selected font". It will show width and height in pixels.
To actually set the font programmatically you use:
CONSOLE_FONT_INFOEX fontInfo;
// ... configure fontInfo
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
See my example at the end of this answer for details.
Or look it up in the fine manual: SetCurrentConsoleFont.
This function only exists since Windows Vista.
4 Set the locale
You will need to set the locale to the locale of the language which characters you want to print.
char* a = setlocale(LC_ALL, "chinese");
The return value is interesting. It will contain a string to describe exactly wich locale was chosen.
Just give it a try :-)
I tested with chinese and german.
More info: setlocale
5 Use wide character output
Not much to say here.
If you want to output wide characters, use this for example:
std::wcout << L"你好" << std::endl;
Oh, and dont forget the L prefix for wide characters!
And if you type literal unicode characters like this in the source file, the source file must be unicode encoded. Like the default in Visual Studio is UTF-16-LE. Or maybe use notepad++ and set the encoding to UCS-2 LE BOM.
Example
Finally I put it all together as an example:
#include <Windows.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale.h>
#include <wincon.h>
int wmain(int argc, wchar_t* argv[])
{
SetConsoleTitle(L"My Console Window - 你好");
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
char* a = setlocale(LC_ALL, "chinese");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
CONSOLE_FONT_INFOEX fontInfo;
fontInfo.cbSize = sizeof(fontInfo);
fontInfo.FontFamily = 54;
fontInfo.FontWeight = 400;
fontInfo.nFont = 0;
const wchar_t myFont[] = L"KaiTi";
fontInfo.dwFontSize = { 18, 41 };
std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName);
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
std::wcout << L"Hello World!" << std::endl;
std::wcout << L"你好!" << std::endl;
return 0;
}
Cheers !
Edit on 2021-11-20
Maybe you can also try to use the new Windows Terminal. It seems to print unicode out of the box. You will still need to configure a font that supports your characters in the settings. It is developed by Microsoft as OpenSource on github and you can also install it from the Microsoft Store. I successfully tried this on Windows 10.
You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:
#include <fmt/core.h>
int main() {
fmt::print("èéøÞǽлљΣæča");
}
Output:
èéøÞǽлљΣæča
This requires compiling with the /utf-8 compiler option in MSVC.
I don't recommend using wcout because it is non-portable and doesn't even work on Windows without extra efforts, for example:
std::wcout << L"èéøÞǽлљΣæča";
will print:
├и├й├╕├Ю╟╜╨╗╤Щ╬г├ж─Нa
in Russian Windows (ACP 1251, console CP 866).
Disclaimer: I'm the author of {fmt}.
The wcout must have the locale set differently to the CRT. Here's how it can be fixed:
int _tmain(int argc, _TCHAR* argv[])
{
char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
std::locale lollocale(locale);
setlocale(LC_ALL, locale); // Restore the CRT.
std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
std::wcout << L"¡Hola!";
std::cin.get();
return 0;
}
I just tested it, and it displays the string here absolutely fine.
SetConsoleCP() and chcp does not the same!
Take this program snippet:
SetConsoleCP(65001) // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);
The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn't match to CP65001 (= UTF-8).
Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.
Type: chcp
Answer: 850
Type: test.exe
Answer: tr├ñnen├╝berstr├ÂmtÔäó
Type: chcp
Answer: 65001 - This setting has changed by SetConsoleCP() but with no useful effect.
Type: chcp 65001
Type: test.exe
Answer: tränenüberströmt™ - All OK now.
Tested with: German Windows XP SP3
Solution 1: use WCHAR
One thing that always works: use wide char all the places. Like,
const wchar_t* str = L"你好\n";
DWORD nwritten = 0;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, 3, &nwritten, NULL);
Unicode is language neutral. You can use any language and won't have encoding issues. You want use UTF-8? Fine. Use MultiByteToWideChar to convert it wide char string first.
Before continue reading the other solution below, please note this one has a unique advantage: it doesn't depend on the system or the user's locale settings.
Solution 2: Set system locale and user locale properly. And they should be the same.
I assume UTF-8 locale for Windows isn't in the picture yet. Then you need to know which language(Chinese, French?) you would use, and change your system settings to match it. There is system level setting:
And a user level setting:
Please set both of them to the same language.
Then, in your program, insert "setlocale(LC_ALL, "");" to your main function. It's a universal rule, no matter which OS you use, whenever you want to use the standard library to process charsets other than ASCII, you should have this line of code. Otherwise, the locale defaults to "C" and it only contains ASCII. Then you can start to use std::wcout and C functions like fputws.
I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.
Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:
You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
write to console using WriteConsoleW
Look through an interesing article about java unicode on windows console
Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.
First, sorry I probably don't have the fonts required so I cannot test it yet.
Something looks a bit fishy here
// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
// lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption
while
// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;
what about
// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;
There are a few issues with the mswcrt and io streams.
Trick _setmode(_fileno(stdout), _O_U16TEXT); working only for MS VC++ not MinGW-GCC. Moreover sometimes it is brings to crashes depending on Windows configuration.
SetConsoleCP(65001) for UTF-8. May fail in many multibyte character scenarios, but is is always OK for UTF-16LE
You need to restore previews console codepage on application exit.
Windows console supports UNICODE with the ReadConsole and WriteConsole functions in UTF-16LE mode. Background effect - piping in this case will not work. I.e. myapp.exe >> ret.log brings to 0 byte ret.log file. If you are ok with this fact you can try my library as following.
const char* umessage = "Hello!\nПривет!\nПривіт!\nΧαιρετίσματα!\nHelló!\nHallå!\n";
...
#include <console.hpp>
#include <ios>
...
std::ostream& cout = io::console::out_stream();
cout << umessage
<< 1234567890ull << '\n'
<< 123456.78e+09 << '\n'
<< 12356.789e+10L << '\n'
<< std::hex << 0xCAFEBABE
<< std::endl;
Library will auto-convert your UTF-8 into UTF-16LE and write it into console using WriteConsole. As well as there are error and input streams.
Another library benefit - colors.
Link on example app:
https://github.com/incoder1/IO/tree/master/examples/iostreams
The library homepage:
https://github.com/incoder1/IO
Screenshot:
Correctly displaying Western European characters in the windows console
Long story short:
use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
History of the discovery
I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.
The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse's console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.
I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:
Start a console by any one of
Win + R then type cmd and hit the Return key.
Hit the Win key and type cmd followed by the return key.
Open the system menu by any one of
click the upper left corner icon
Hit the Alt + Space key combination
then select "Default" to change the behavior of all subsequent console windows
click the "Font" tab
Select Consolas or Lucida console
Click OK
Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn't work for my Spanish characters.
Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute
chcp 28591
before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.
Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.
Please note that HKCU ("HKEY_CURRENT_USER") is only for the current user. If you want to change it for all users in that computer, you'll need to use the regedit utility and find/create the corresponding Console key (probably you'll have to create a Console key inside HKEY_USERS\.DEFAULT)
Default encoding on:
Windows UTF-16.
Linux UTF-8.
MacOS UTF-8.
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Add Macros to detect Platform.
#if defined (_WIN32)
#define WINDOWSLIB 1
#elif defined (__ANDROID__) || defined(ANDROID)//Android
#define ANDROIDLIB 1
#elif defined (__APPLE__)//iOS, Mac OS
#define MACOSLIB 1
#elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#endif
Create conversion functions std::wstring to std::string or viceversa.
#include <locale>
#include <iostream>
#include <string>
#ifdef WINDOWSLIB
#include <Windows.h>
#endif
using namespace std::literals::string_literals;
// Convert std::wstring to std::string
std::string WidestringToString(const std::wstring& wstr, const std::string& locale)
{
if (wstr.empty())
{
return std::string();
}
size_t pos;
size_t begin = 0;
std::string ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
ret.append(converted);
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
// Convert std::string to std::wstring
std::wstring StringToWideString(const std::string& str, const std::string& locale)
{
if (str.empty())
{
return std::wstring();
}
size_t pos;
size_t begin = 0;
std::wstring ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
Print std::string.
Check RawString Suffix.
Linux Code. Print directly std::string using std::cout.
If you have std::wstring.
1. Convert to std::string.
2. Print with std::cout.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
std::cout << "RESULT=" << result << std::endl;
std::cout << "RESULT_SIZE=" << result.size() << std::endl;
On Windows if you need to print unicode. We need to use WriteConsole for print unicode chars from std::wstring or std::string.
void WriteUnicodeLine(const std::string& s)
{
#ifdef WINDOWSLIB
WriteUnicode(s);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << s << std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicode(const std::string& s)
{
#ifdef WINDOWSLIB
std::wstring unicode = Insane::String::Strings::StringToWideString(s);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << s;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeLineW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws)<<std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws);
#elif defined MACOSLIB
#endif
}
Windows Code. Using WriteLineUnicode or WriteUnicode function. Same code can be used for Linux.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
WriteLineUnicode(u8"RESULT" + result);
WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));
Finally on Windows. You need a powerfull and complete support for unicode chars in console.
I recommend ConEmu and set as default terminal on Windows.
Test on Microsoft Visual Studio and Jetbrains Clion.
Tested on Microsoft Visual Studio 2017 with VC++; std=c++17. (Windows Project)
Tested on Microsoft Visual Studio 2017 with g++; std=c++17. (Linux Project)
Tested on Jetbrains Clion 2018.3 with g++; std=c++17. (Linux Toolchain / Remote)
QA
Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. wstring on Windows are interchan.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windows.
Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in all compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windows.
running a console App from VS2017 under Win10 with UK regional settings required me to:
set VS2017 tools > Environment > Fonts and Colors > Font: 'Lucida' for instance
save C++ source files with encoding "Unicode (UTF-8 with signature) - Codepage 650001" so that you can type-in accented character litterals L"âéïôù" without compiler warnings, yet avoid double byte characters everywhere
compile with Configuration Properties > General > CharacterSet > "Use Multi-byte.." and the Configuration Properties > C/C++ > All Options > Additional Options > "/utf-8" flag
#include <iostream>, <io.h>, and <fcntl.h>
execute an obscure '_setmode(_fileno(stdout), _O_WTEXT);' once at the start of the App
forget 'cout <<... ;' and use only 'wcout << ... ;'
For memo, VS2015 on Win7 required a 'SetConsoleOutputCP(65001);' and allowed to mix outputs via wcout and cout.
If you were looking for a portable solution, which is unfortunately still not part of a standard as of C++20, I can recommend the nowide library. It comes either standalone or as part of boost. You'll find many standard counterparts consuming or emitting utf-8 encoded chars there. Yes, chars, not char8_ts (yet). Feel free to use char8_t-remediation utilities to interpret char8_ts as chars, if your program sports them already.
The requested code snippet would look like this:
#include <boost/nowide/iostream.hpp>
#include <char8_t-remediation.h>
int main()
{
using boost::nowide::cout;
cout << U8("¡Hola!") << std::endl;
}
Note: Please be aware of streams orientation issue. A short recommendation in the context of my answer would be: Use exclusively nowide streams for input / output and utf-8 encoded data.
In my case I'm reading UTF-8 file and print to Console, I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
Testing with VS2019 with UNICODE console app on Win10 found the following testing Spanish and Japanese:
If you just wprintf a string then you get the wrong characters for Spanish (Japanese not tested but sure it won't work). It appears the default "C" locale default is ASCII (the traditional extended ASCII collating table for PC's).
using: setlocale(LC_ALL, ""); sets the correct code page to CP1252 when using Spanish (Mexico) Windows language setting and the output is good (lucida console font). However, Japanese output (using Japanese Windows Language) is suppressed (meaning no output for those characters, normal Latin characters are output).
using: '_setmode(_fileno(stdout), _O_U16TEXT);` output works correctly for all. However all output is 16bit so redirection to a file outputs 16bit characters.
using: printf and UTF-8 text output with SetConsoleOutputCP(CP_UTF8) also works (but not if you set it after setlocale(LC_ALL, ""); - I had to remove that for the output to work).
Fonts: For the Asian characters use MS Mincho, for the others you can use Lucida Console.

Can't display unicode in c++ console on English windows [duplicate]

Hi I was trying to output unicode string to a console with iostreams and failed.
I found this: Using unicode font in c++ console app and this snippet works.
SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);
However, I did not find any way to output unicode correctly with iostreams. Any suggestions?
This does not work:
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;
EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.
//Unicode output for a Windows console
ostream &operator-(ostream &stream, const wchar_t *s)
{
int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char *buf = new char[bufSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
wprintf(L"%S", buf);
delete[] buf;
return stream;
}
ostream &operator-(ostream &stream, const wstring &s)
{
stream - s.c_str();
return stream;
}
I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).
Solution:
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}
Screenshot:
Unicode Hello World in Chinese
Here is a Hello World in Chinese. Actually it is just "Hello". I tested this on Windows 10, but I think it might work since Windows Vista.
Before Windows Vista it will be hard, if you want a programmatic solution, instead of configuring the console / registry etc. Maybe have a look here if you really need to do this on Windows 7: Change console Font Windows 7
I dont want to claim this is the only solution, but this is what worked for me.
Outline
Unicode project setup
Set the console codepage to unicode
Find and use a font that supports the characters you want to display
Use the locale of the language you want to display
Use the wide character output i.e. std::wcout
1 Project Setup
I am using Visual Studio 2017 CE. I created a blank console app. The default settings are alright. But if you experience problems or you use a different ide you might want to check these:
In your project properties find configuration properties -> General -> Project Defaults -> Character Set. It should be "Use Unicode Character Set" not "Multi-Byte".
This will define _UNICODE and UNICODE preprocessor macros for you.
int wmain(int argc, wchar_t* argv[])
Also I think we should use wmain function instead of main. They both work, but in a unicode environment wmain may be more convenient.
Also my source files are UTF-16-LE encoded, which seems to be the default in Visual Studio 2017.
2 Console Codepage
This is quite obvious. We need the unicode codepage in the console.
If you want to check your default codepage, just open a console and type chcp withou any arguments.
We have to change it to 65001, which is the UTF-8 codepage. Windows Codepage Identifiers
There is a preprocessor macro for that codepage: CP_UTF8.
I needed to set both, the input and output codepage. When I omitted either one, the output was incorrect.
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
You might also want to check the boolean return values of those functions.
3 Choose a Font
Until yet I didnt find a console font that supports every character. So I had to choose one.
If you want to output characters which are partly only available in one font and partly in another font, then I believe it is impossible to find a solution. Only maybe if there is a font out there that supports every character. But also I didnt look into how to install a font.
I think it is not possible to use two different fonts in the same console window at the same time.
How to find a compatible font?
Open your console, go to the properties of the console window by clicking on the icon in the upper left of the window. Go to the fonts tab and choose a font and click ok. Then try to enter your characters in the console window. Repeat this until you find a font you can work with. Then note down the name of the font.
Also you can change the size of the font in the properties window.
If you found a size you are happy with, note down the size values that are displayed in the properties window in the section "selected font". It will show width and height in pixels.
To actually set the font programmatically you use:
CONSOLE_FONT_INFOEX fontInfo;
// ... configure fontInfo
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
See my example at the end of this answer for details.
Or look it up in the fine manual: SetCurrentConsoleFont.
This function only exists since Windows Vista.
4 Set the locale
You will need to set the locale to the locale of the language which characters you want to print.
char* a = setlocale(LC_ALL, "chinese");
The return value is interesting. It will contain a string to describe exactly wich locale was chosen.
Just give it a try :-)
I tested with chinese and german.
More info: setlocale
5 Use wide character output
Not much to say here.
If you want to output wide characters, use this for example:
std::wcout << L"你好" << std::endl;
Oh, and dont forget the L prefix for wide characters!
And if you type literal unicode characters like this in the source file, the source file must be unicode encoded. Like the default in Visual Studio is UTF-16-LE. Or maybe use notepad++ and set the encoding to UCS-2 LE BOM.
Example
Finally I put it all together as an example:
#include <Windows.h>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale.h>
#include <wincon.h>
int wmain(int argc, wchar_t* argv[])
{
SetConsoleTitle(L"My Console Window - 你好");
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
char* a = setlocale(LC_ALL, "chinese");
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
CONSOLE_FONT_INFOEX fontInfo;
fontInfo.cbSize = sizeof(fontInfo);
fontInfo.FontFamily = 54;
fontInfo.FontWeight = 400;
fontInfo.nFont = 0;
const wchar_t myFont[] = L"KaiTi";
fontInfo.dwFontSize = { 18, 41 };
std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName);
SetCurrentConsoleFontEx(hConsole, false, &fontInfo);
std::wcout << L"Hello World!" << std::endl;
std::wcout << L"你好!" << std::endl;
return 0;
}
Cheers !
Edit on 2021-11-20
Maybe you can also try to use the new Windows Terminal. It seems to print unicode out of the box. You will still need to configure a font that supports your characters in the settings. It is developed by Microsoft as OpenSource on github and you can also install it from the Microsoft Store. I successfully tried this on Windows 10.
You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:
#include <fmt/core.h>
int main() {
fmt::print("èéøÞǽлљΣæča");
}
Output:
èéøÞǽлљΣæča
This requires compiling with the /utf-8 compiler option in MSVC.
I don't recommend using wcout because it is non-portable and doesn't even work on Windows without extra efforts, for example:
std::wcout << L"èéøÞǽлљΣæča";
will print:
├и├й├╕├Ю╟╜╨╗╤Щ╬г├ж─Нa
in Russian Windows (ACP 1251, console CP 866).
Disclaimer: I'm the author of {fmt}.
The wcout must have the locale set differently to the CRT. Here's how it can be fixed:
int _tmain(int argc, _TCHAR* argv[])
{
char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
std::locale lollocale(locale);
setlocale(LC_ALL, locale); // Restore the CRT.
std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
std::wcout << L"¡Hola!";
std::cin.get();
return 0;
}
I just tested it, and it displays the string here absolutely fine.
SetConsoleCP() and chcp does not the same!
Take this program snippet:
SetConsoleCP(65001) // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);
The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn't match to CP65001 (= UTF-8).
Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.
Type: chcp
Answer: 850
Type: test.exe
Answer: tr├ñnen├╝berstr├ÂmtÔäó
Type: chcp
Answer: 65001 - This setting has changed by SetConsoleCP() but with no useful effect.
Type: chcp 65001
Type: test.exe
Answer: tränenüberströmt™ - All OK now.
Tested with: German Windows XP SP3
Solution 1: use WCHAR
One thing that always works: use wide char all the places. Like,
const wchar_t* str = L"你好\n";
DWORD nwritten = 0;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, 3, &nwritten, NULL);
Unicode is language neutral. You can use any language and won't have encoding issues. You want use UTF-8? Fine. Use MultiByteToWideChar to convert it wide char string first.
Before continue reading the other solution below, please note this one has a unique advantage: it doesn't depend on the system or the user's locale settings.
Solution 2: Set system locale and user locale properly. And they should be the same.
I assume UTF-8 locale for Windows isn't in the picture yet. Then you need to know which language(Chinese, French?) you would use, and change your system settings to match it. There is system level setting:
And a user level setting:
Please set both of them to the same language.
Then, in your program, insert "setlocale(LC_ALL, "");" to your main function. It's a universal rule, no matter which OS you use, whenever you want to use the standard library to process charsets other than ASCII, you should have this line of code. Otherwise, the locale defaults to "C" and it only contains ASCII. Then you can start to use std::wcout and C functions like fputws.
I don't think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you're going to output.
Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:
You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
write to console using WriteConsoleW
Look through an interesing article about java unicode on windows console
Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.
First, sorry I probably don't have the fonts required so I cannot test it yet.
Something looks a bit fishy here
// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
// lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption
while
// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize];
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;
what about
// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;
There are a few issues with the mswcrt and io streams.
Trick _setmode(_fileno(stdout), _O_U16TEXT); working only for MS VC++ not MinGW-GCC. Moreover sometimes it is brings to crashes depending on Windows configuration.
SetConsoleCP(65001) for UTF-8. May fail in many multibyte character scenarios, but is is always OK for UTF-16LE
You need to restore previews console codepage on application exit.
Windows console supports UNICODE with the ReadConsole and WriteConsole functions in UTF-16LE mode. Background effect - piping in this case will not work. I.e. myapp.exe >> ret.log brings to 0 byte ret.log file. If you are ok with this fact you can try my library as following.
const char* umessage = "Hello!\nПривет!\nПривіт!\nΧαιρετίσματα!\nHelló!\nHallå!\n";
...
#include <console.hpp>
#include <ios>
...
std::ostream& cout = io::console::out_stream();
cout << umessage
<< 1234567890ull << '\n'
<< 123456.78e+09 << '\n'
<< 12356.789e+10L << '\n'
<< std::hex << 0xCAFEBABE
<< std::endl;
Library will auto-convert your UTF-8 into UTF-16LE and write it into console using WriteConsole. As well as there are error and input streams.
Another library benefit - colors.
Link on example app:
https://github.com/incoder1/IO/tree/master/examples/iostreams
The library homepage:
https://github.com/incoder1/IO
Screenshot:
Correctly displaying Western European characters in the windows console
Long story short:
use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
History of the discovery
I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.
The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse's console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.
I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:
Start a console by any one of
Win + R then type cmd and hit the Return key.
Hit the Win key and type cmd followed by the return key.
Open the system menu by any one of
click the upper left corner icon
Hit the Alt + Space key combination
then select "Default" to change the behavior of all subsequent console windows
click the "Font" tab
Select Consolas or Lucida console
Click OK
Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn't work for my Spanish characters.
Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute
chcp 28591
before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.
Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.
REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591
This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.
Please note that HKCU ("HKEY_CURRENT_USER") is only for the current user. If you want to change it for all users in that computer, you'll need to use the regedit utility and find/create the corresponding Console key (probably you'll have to create a Console key inside HKEY_USERS\.DEFAULT)
Default encoding on:
Windows UTF-16.
Linux UTF-8.
MacOS UTF-8.
My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:
Add Macros to detect Platform.
#if defined (_WIN32)
#define WINDOWSLIB 1
#elif defined (__ANDROID__) || defined(ANDROID)//Android
#define ANDROIDLIB 1
#elif defined (__APPLE__)//iOS, Mac OS
#define MACOSLIB 1
#elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat
#define LINUXLIB 1
#endif
Create conversion functions std::wstring to std::string or viceversa.
#include <locale>
#include <iostream>
#include <string>
#ifdef WINDOWSLIB
#include <Windows.h>
#endif
using namespace std::literals::string_literals;
// Convert std::wstring to std::string
std::string WidestringToString(const std::wstring& wstr, const std::string& locale)
{
if (wstr.empty())
{
return std::string();
}
size_t pos;
size_t begin = 0;
std::string ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
ret.append(converted);
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
_wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc);
std::string converted = std::string(size, 0);
_wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = wstr.find(static_cast<wchar_t>(0), begin);
while (pos != std::wstring::npos && begin < wstr.length())
{
std::wstring segment = std::wstring(&wstr[begin], pos - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = wstr.find(static_cast<wchar_t>(0), begin);
}
if (begin <= wstr.length()) {
std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin);
size = wcstombs(nullptr, segment.c_str(), 0);
std::string converted = std::string(size, 0);
wcstombs(&converted[0], segment.c_str(), converted.size());
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
// Convert std::string to std::wstring
std::wstring StringToWideString(const std::string& str, const std::string& locale)
{
if (str.empty())
{
return std::wstring();
}
size_t pos;
size_t begin = 0;
std::wstring ret;
size_t size;
#ifdef WINDOWSLIB
_locale_t lc = _create_locale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size() + 1, 0);
_mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc);
converted.resize(size - 1);
ret.append(converted);
}
_free_locale(lc);
#elif defined LINUXLIB
std::string currentLocale = setlocale(LC_ALL, nullptr);
setlocale(LC_ALL, locale.c_str());
pos = str.find(static_cast<char>(0), begin);
while (pos != std::string::npos) {
std::string segment = std::string(&str[begin], pos - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
ret.append({ 0 });
begin = pos + 1;
pos = str.find(static_cast<char>(0), begin);
}
if (begin < str.length()) {
std::string segment = std::string(&str[begin], str.length() - begin);
std::wstring converted = std::wstring(segment.size(), 0);
size = mbstowcs(&converted[0], &segment[0], converted.size());
converted.resize(size);
ret.append(converted);
}
setlocale(LC_ALL, currentLocale.c_str());
#elif defined MACOSLIB
#endif
return ret;
}
Print std::string.
Check RawString Suffix.
Linux Code. Print directly std::string using std::cout.
If you have std::wstring.
1. Convert to std::string.
2. Print with std::cout.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
std::cout << "RESULT=" << result << std::endl;
std::cout << "RESULT_SIZE=" << result.size() << std::endl;
On Windows if you need to print unicode. We need to use WriteConsole for print unicode chars from std::wstring or std::string.
void WriteUnicodeLine(const std::string& s)
{
#ifdef WINDOWSLIB
WriteUnicode(s);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << s << std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicode(const std::string& s)
{
#ifdef WINDOWSLIB
std::wstring unicode = Insane::String::Strings::StringToWideString(s);
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << s;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeLineW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
std::cout << std::endl;
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws)<<std::endl;
#elif defined MACOSLIB
#endif
}
void WriteUnicodeW(const std::wstring& ws)
{
#ifdef WINDOWSLIB
WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr);
#elif defined LINUXLIB
std::cout << String::Strings::WidestringToString(ws);
#elif defined MACOSLIB
#endif
}
Windows Code. Using WriteLineUnicode or WriteUnicode function. Same code can be used for Linux.
std::wstring x = L"\0\001日本ABC\0DE\0F\0G🐶\0"s;
std::string result = WidestringToString(x, "en_US.UTF-8");
WriteLineUnicode(u8"RESULT" + result);
WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));
Finally on Windows. You need a powerfull and complete support for unicode chars in console.
I recommend ConEmu and set as default terminal on Windows.
Test on Microsoft Visual Studio and Jetbrains Clion.
Tested on Microsoft Visual Studio 2017 with VC++; std=c++17. (Windows Project)
Tested on Microsoft Visual Studio 2017 with g++; std=c++17. (Linux Project)
Tested on Jetbrains Clion 2018.3 with g++; std=c++17. (Linux Toolchain / Remote)
QA
Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. wstring on Windows are interchan.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.
Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windows.
Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in all compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windows.
running a console App from VS2017 under Win10 with UK regional settings required me to:
set VS2017 tools > Environment > Fonts and Colors > Font: 'Lucida' for instance
save C++ source files with encoding "Unicode (UTF-8 with signature) - Codepage 650001" so that you can type-in accented character litterals L"âéïôù" without compiler warnings, yet avoid double byte characters everywhere
compile with Configuration Properties > General > CharacterSet > "Use Multi-byte.." and the Configuration Properties > C/C++ > All Options > Additional Options > "/utf-8" flag
#include <iostream>, <io.h>, and <fcntl.h>
execute an obscure '_setmode(_fileno(stdout), _O_WTEXT);' once at the start of the App
forget 'cout <<... ;' and use only 'wcout << ... ;'
For memo, VS2015 on Win7 required a 'SetConsoleOutputCP(65001);' and allowed to mix outputs via wcout and cout.
If you were looking for a portable solution, which is unfortunately still not part of a standard as of C++20, I can recommend the nowide library. It comes either standalone or as part of boost. You'll find many standard counterparts consuming or emitting utf-8 encoded chars there. Yes, chars, not char8_ts (yet). Feel free to use char8_t-remediation utilities to interpret char8_ts as chars, if your program sports them already.
The requested code snippet would look like this:
#include <boost/nowide/iostream.hpp>
#include <char8_t-remediation.h>
int main()
{
using boost::nowide::cout;
cout << U8("¡Hola!") << std::endl;
}
Note: Please be aware of streams orientation issue. A short recommendation in the context of my answer would be: Use exclusively nowide streams for input / output and utf-8 encoded data.
In my case I'm reading UTF-8 file and print to Console, I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
Testing with VS2019 with UNICODE console app on Win10 found the following testing Spanish and Japanese:
If you just wprintf a string then you get the wrong characters for Spanish (Japanese not tested but sure it won't work). It appears the default "C" locale default is ASCII (the traditional extended ASCII collating table for PC's).
using: setlocale(LC_ALL, ""); sets the correct code page to CP1252 when using Spanish (Mexico) Windows language setting and the output is good (lucida console font). However, Japanese output (using Japanese Windows Language) is suppressed (meaning no output for those characters, normal Latin characters are output).
using: '_setmode(_fileno(stdout), _O_U16TEXT);` output works correctly for all. However all output is 16bit so redirection to a file outputs 16bit characters.
using: printf and UTF-8 text output with SetConsoleOutputCP(CP_UTF8) also works (but not if you set it after setlocale(LC_ALL, ""); - I had to remove that for the output to work).
Fonts: For the Asian characters use MS Mincho, for the others you can use Lucida Console.

How do I print UTF-8 from c++ console application on Windows

For a C++ console application compiled with Visual Studio 2008 on English Windows (XP,Vista or 7). Is it possible to print out to the console and correctly display UTF-8 encoded Japanese using cout or wcout?
This should work:
#include <cstdio>
#include <windows.h>
#pragma execution_character_set( "utf-8" )
int main()
{
SetConsoleOutputCP( 65001 );
printf( "Testing unicode -- English -- Ελληνικά -- Español -- Русский. aäbcdefghijklmnoöpqrsßtuüvwxyz\n" );
}
Don't know if it affects anything, but source file is saved as Unicode (UTF-8 with signature) - Codepage 65001 at FILE -> Advanced Save Options ....
Project -> Properties -> Configuration Properties -> General -> Character Set is set to Use Unicode Character Set.
Some say you need to change console font to Lucida Console, but on my side it is displayed with both Consolas and Lucida Console.
The Windows console uses the OEM code page by default to display output.
To change the code page to Unicode enter chcp 65001 in the console, or try to change the code page programmatically with SetConsoleOutputCP.
Note that you probably have to change the font of the console to one that has glyphs in the unicode range.
Here's an article from MVP Michael Kaplan on how to correctly output UTF-16 through the console. You could convert your UTF-8 to UTF-16 and output that.
I've never actually tried setting the console code-page to UTF8 (not sure why it wouldn't work... the console can handle other multi-byte code-pages just fine), but there are a couple of functions to look up: SetConsoleCP and SetConsoleOutputCP.
You'll probably also need to make sure you're using a console font that is capable of displaying your characters. There's the SetCurrentConsoleFontEx function, but it's only available on Vista and above.
Hope that helps.
On app start console set to default OEM437 CP.
I was trying to output Unicode text to stdout, where console was switch to UTF8 translation _setmode(_fileno(stdout), _O_U8TEXT); and still had no luck on the screen even with Lucida TT font.
If console was redirected to file, correct UTF8 file were created.
Finally I was lucky. I have added single line "info.FontFamily = FF_DONTCARE;" and it is working now.
Hope this help for you.
void SetLucidaFont()
{
HANDLE StdOut = GetStdHandle(STD_OUTPUT_HANDLE);
CONSOLE_FONT_INFOEX info;
memset(&info, 0, sizeof(CONSOLE_FONT_INFOEX));
info.cbSize = sizeof(CONSOLE_FONT_INFOEX); // prevents err=87 below
if (GetCurrentConsoleFontEx(StdOut, FALSE, &info))
{
info.FontFamily = FF_DONTCARE;
info.dwFontSize.X = 0; // leave X as zero
info.dwFontSize.Y = 14;
info.FontWeight = 400;
_tcscpy_s(info.FaceName, L"Lucida Console");
if (SetCurrentConsoleFontEx(StdOut, FALSE, &info))
{
}
}
}
Just for additional information:
'ANSI' refers to windows-125x, used for win32 applications while 'OEM' refers to the code page used by console/MS-DOS applications.
Current active code-pages can be retrieved with functions GetOEMCP() and GetACP().
In order to output something correctly to the console, you should:
ensure the current OEM code page supports the characters you want to output
(if necessary, use SetConsoleOutputCP to set it properly)
convert the string from current ANSI code (win32) to the console OEM code page
Here are some utilities for doing so:
// Convert a UTF-16 string (16-bit) to an OEM string (8-bit)
#define UNICODEtoOEM(str) WCHARtoCHAR(str, CP_OEMCP)
// Convert an OEM string (8-bit) to a UTF-16 string (16-bit)
#define OEMtoUNICODE(str) CHARtoWCHAR(str, CP_OEMCP)
// Convert an ANSI string (8-bit) to a UTF-16 string (16-bit)
#define ANSItoUNICODE(str) CHARtoWCHAR(str, CP_ACP)
// Convert a UTF-16 string (16-bit) to an ANSI string (8-bit)
#define UNICODEtoANSI(str) WCHARtoCHAR(str, CP_ACP)
/* Convert a single/multi-byte string to a UTF-16 string (16-bit).
We take advantage of the MultiByteToWideChar function that allows to specify the charset of the input string.
*/
LPWSTR CHARtoWCHAR(LPSTR str, UINT codePage) {
size_t len = strlen(str) + 1;
int size_needed = MultiByteToWideChar(codePage, 0, str, len, NULL, 0);
LPWSTR wstr = (LPWSTR) LocalAlloc(LPTR, sizeof(WCHAR) * size_needed);
MultiByteToWideChar(codePage, 0, str, len, wstr, size_needed);
return wstr;
}
/* Convert a UTF-16 string (16-bit) to a single/multi-byte string.
We take advantage of the WideCharToMultiByte function that allows to specify the charset of the output string.
*/
LPSTR WCHARtoCHAR(LPWSTR wstr, UINT codePage) {
size_t len = wcslen(wstr) + 1;
int size_needed = WideCharToMultiByte(codePage, 0, wstr, len, NULL, 0, NULL, NULL);
LPSTR str = (LPSTR) LocalAlloc(LPTR, sizeof(CHAR) * size_needed );
WideCharToMultiByte(codePage, 0, wstr, len, str, size_needed, NULL, NULL);
return str;
}
For anyone need to read UTF-8 from file and print to console can try wifstream, even in visual studio debugger shows UTF-8 words correctly (I'm processing traditional chinese), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
In the console, enter chcp 65001 to change the code page to that of UTF-8.