std::wcout strange error: truncated output of std::wstring - c++

I'm rather curious about the phenomenon, std::wcout can't output the whole content of std::wstring. Am I missing something?
Here is my output:
F:\
F:\
My code snippet is as follows:
std::wstring ws(L"F:\\右旋不规则.pdf");
std::wcout << ws << std::endl;
std::wcout << ws.data() << std::endl;

There are already several threads on this topic:
Output unicode strings in Windows console app
Using Unicode font in C++ console app
Output Unicode to console Using C++, in Windows
The point is you need the system to be able to display your Chinese characters (they are Chinese, right?). I don't think that the default fonts available for the console are able to do that. Lucinda Console could be used for many Unicode characters, but I don't think it's able to display Chinese. If you have a font for that, you can add it to the Console.
How to display japanese Kanji inside a cmd window under windows?
https://superuser.com/questions/5035/how-to-change-the-windows-xp-console-font

Related

Unicode console output with wxwidgets

I'm trying to use wxwidgets 3.0.3 with mingw-w64 on Windows.
wxPrintf is patched like https://github.com/wxWidgets/wxWidgets/commit/06458cb89fb8449f377b0b782404b9a9cbe3ae2d#diff-9cf4eef4822377649a928c11237e38f6
Source code is saved in UTF-8
I init wxLocale like:
wxLocale m_locale;
m_locale.Init(wxLANGUAGE_RUSSIAN , wxLOCALE_DONT_LOAD_DEFAULT )
The console output has 8 bit encoding (CP866), that I can check with GetConsoleOutputCP() and GetConsoleCP(). So I have correct output for Latin and Russian characters, but not for Greek (Lucida Console font is used):
wxString s = L"Latin, Русский, \u03BE ρπξ\n\n";
or
wxString s = wxString::FromUTF8("Latin, Русский, \u03BE ρπξ\n\n");
wxPrintf(s.utf8_str()); // not correct output for Greek
If I force console output to be UTF-8:
SetConsoleCP(CP_UTF8);
SetConsoleOutputCP(CP_UTF8);
the wxPrintf doesn't work correct.
(I can have correct output once using std::cout << s.utf8_str().data(). Some memory leak?)
Using SetConsoleOutputCP(CP_WINUNICODE); doesn't change the console encoding (remain cp866).
Is there way to use standard wxWidgets means (wxPrintf and Stream classes provided by wxWidgets library) to use Unicode console output?

Is it possible to cout an EM DASH on Linux and Windows? [duplicate]

This question already has answers here:
Output Unicode to console Using C++, in Windows
(5 answers)
Closed 7 years ago.
I haven't been able to find a way to cout a '—' character, whether I put that in the cout statement like this: cout << "—"; or use char(151), the program prints out a fuzzy undefined character. Do you guys see anything wrong with my code? Is couting a EM DASH even possible?
Edit: I've also tried wcout << L"—"; and std::wcout << wchar_t(0x2014);. Those both print nothing in my terminal.
First of all, EM DASH is an unicode character (just making sure you do know that).
Printing unicode characters depends on what you're printing to.
If you're printing to a Unix terminal (or an emulator), the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do what you just did above in your source code cout << "—";
If you're getting fuzzy undefined characters, it is possible that your terminal just doesn't support that character.
If you're in windows (where it is harder), you can do something like this (which is not portable):
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"—";
}
There's no universal support for Unicode in C++ and in various terminals, so there won't be a portable solution.
The thing is that the Windows console uses codepages in console by default. It probably uses UTF-16 internally but will always convert to and from the current ANSI codepage when interacting with outside. So simply printing an UTF-16 code point like std::wcout << wchar_t(0x2014); won't work without any prior setup. You need to switch to UTF-8 by running chcp 65001 in the console or _setmode(_fileno(stdout), _O_U16TEXT); in code before printing the character out with
std::wcout << L"—";
It will not always work because of the worse Unicode support in Windows console. In many cases the characters don't appear due issues in the renderer or font, replacing with squares or ????. But in that case just copy the text out and paste to any Unicode text box then it will be displayed properly
If you're using Windows in English or some other Western European languages that use codepage 1252/ISO-8859-1 then you can print em-dash which is at the codepoint 151 simply by
cout << (char)151;
If it doesn't work then you're not on codepage 1252. You can change it to 1252 if possible or look up for em-dash in your codepage (if available)
On Linux things are much simpler because UTF-8 are used by default. So you can output the string as normal without resorting to std::wcout
std::cout << "—"; // need to make sure that std::string is in UTF-8
// or use std::cout << u8"—" to force the encoding
In fact you'll often get surprise results if you use wide strings on Linux. std::wcout << L"—" won't often work because of some possible bugs in libc
That said, Windows 10 console now supports UTF-8 perfectly and even allows to use UTF-8 as the locale so if you don't need to support Windows 7 then there's a universal method to print any Unicode strings:
std::cout << u8"—";

Printing UTF8 characters to linux console using C++

I am trying to get the unicode character macron (U+00AF), i.e., an overscore, to print consistently on various linux consoles. So far, some consoles work (e.g., putty ssh), others do not (e.g., ubuntu shell), and I have not been able to figure out what I am doing right in one case (probably luck) and wrong in the other.
I do know the basics of Unicode and Utf8, but I have not been able to figure out how to consistently get consoles to display the appropriate characters.
Any suggestions? Note that this is explicitly for unix consoles - all of the similar questions I have found focused on Windows-specific console commands.
Here is what I would effectively like to get working:
wchar_t post = L'¯'; //0xC2AF
std::wcout << post << std::endl;
Unfortunately nothing I tried or could find in the way of suggestions consistently displayed the appropriate character, so I ended up using an ASCII hyphen '-' as a close enough match.
The solution is to put it into stream as a multicharacter string:
std::string s = "\uC2AF";
std::cout << s << std::endl;
or to set a locale using
char* setlocale( int category, const char* locale);
function:
std::locale old_locale; // current locale
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t w = 0xC2AF;
std::wcout << w << std::endl;
setlocale(LC_ALL, old_locale.name().c_str()); // restore locale
The final result is however dependent on many user settings (console, fonts, etc.), so there is no guarantee that it will be OK.

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?
#include <iostream>
using namespace std;
int main()
{
wcout << L"こんにちは世界\n";
wcout << L"Hello World\n"
system("pause");
}
In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.
This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.
#include <iostream>
int main() {
std::cout << "こんにちは世界\n";
}
This works fine on any system where:
The compiler's source and execution encodings include the characters.
The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
A font with the appropriate characters is available (usually not a problem).
Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.
wcout << L"こんにちは世界\n";
In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.
So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.
You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.
For example:
wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));
SetConsoleOutputCP(CP_UTF8);
wcout << L"こんにちは世界\n";
Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.
There are a few options:
Avoid the standard library entirely:
DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);
Use non-standard magical incantation that will break standard code:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"こんにちは世界\n";
After setting this mode std::cout << "Hello, World"; will crash.
Use a low level IO API along with manual conversion:
#include <codecvt>
#include <locale>
SetConsoleOutputCP(CP_UTF8);
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::puts(convert.to_bytes(L"こんにちは世界\n"));
Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.
You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.
There's a whole article about dealing with Unicode in Windows console
http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
Basically, you may implement you own streambuf for std::cout (or std::wcout) in terms of WriteConsoleW and enjoy writing UTF-8 (or whatever Unicode you want) to Windows console without depending on locales, console code pages and even without using wide characters.
It may not look very straightforward, but it's convenient and reusable solution, which is also able to give you a portable utf8-everywhere style user code. Please, don't beat me for my English :)
Or you can change Windows locale to Japanese.

How can I display unicode characters in a linux terminal using C++?

I'm working on a chess game in C++ on a linux environment and I want to display the pieces using unicode characters in a bash terminal. Is there any way to display the symbols using cout?
An example that outputs a knight would be nice: ♞ = U+265E.
To output Unicode characters you just use output streams, the same way you would output ASCII characters. You can store the Unicode codepoint as a multi-character string:
std::string str = "\u265E";
std::cout << str << std::endl;
It may also be convenient to use wide character output if you want to output a single Unicode character with a codepoint above the ASCII range:
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t codepoint = 0x265E;
std::wcout << codepoint << std::endl;
However, as others have noted, whether this displays correctly is dependent on a lot of factors in the user's environment, such as whether or not the user's terminal supports Unicode display, whether or not the user has the proper fonts installed, etc. This shouldn't be a problem for most out-of-the-box mainstream distros like Ubuntu/Debian with Gnome installed, but don't expect it to work everywhere.
Sorry misunderstood your question at first. This code prints a white king in terminal (tested it with KDE Konsole)
#include <iostream>
int main(int argc, char* argv[])
{
std::cout <<"\xe2\x99\x94"<<std::endl;
return 0;
}
Normally encoding is specified through a locale. Try to set environment variables.
In order to tell applications to use
UTF-8 encoding, and assuming U.S.
English is your preferred language,
you could use the following command:
export LC_ALL=en_US.UTF-8
Are you using a "bare" terminal or something running under X-Server?