How can I display unicode characters in a linux terminal using C++? - c++

I'm working on a chess game in C++ on a linux environment and I want to display the pieces using unicode characters in a bash terminal. Is there any way to display the symbols using cout?
An example that outputs a knight would be nice: ♞ = U+265E.

To output Unicode characters you just use output streams, the same way you would output ASCII characters. You can store the Unicode codepoint as a multi-character string:
std::string str = "\u265E";
std::cout << str << std::endl;
It may also be convenient to use wide character output if you want to output a single Unicode character with a codepoint above the ASCII range:
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t codepoint = 0x265E;
std::wcout << codepoint << std::endl;
However, as others have noted, whether this displays correctly is dependent on a lot of factors in the user's environment, such as whether or not the user's terminal supports Unicode display, whether or not the user has the proper fonts installed, etc. This shouldn't be a problem for most out-of-the-box mainstream distros like Ubuntu/Debian with Gnome installed, but don't expect it to work everywhere.

Sorry misunderstood your question at first. This code prints a white king in terminal (tested it with KDE Konsole)
#include <iostream>
int main(int argc, char* argv[])
{
std::cout <<"\xe2\x99\x94"<<std::endl;
return 0;
}
Normally encoding is specified through a locale. Try to set environment variables.
In order to tell applications to use
UTF-8 encoding, and assuming U.S.
English is your preferred language,
you could use the following command:
export LC_ALL=en_US.UTF-8
Are you using a "bare" terminal or something running under X-Server?

Related

C++ output Unicode in variable

I'm trying to output a string containing unicode characters, which is received with a curl call. Therefore, I'm looking for something similar to u8 and L options for literal strings, but than applicable for variables. E.g.:
const char *s = u8"\u0444";
However, since I have a string containing unicode characters, such as:
mit freundlichen Grüßen
When I want to print this string with:
cout << UnicodeString << endl;
it outputs:
mit freundlichen Gr??en
When I use wcout, it returns me:
mit freundlichen Gren
What am I doing wrong and how can I achieve the correct output. I return the output with RapidJSON, which returns the string as:
mit freundlichen Gr��en
Important to note, the application is a CGI running on Ubuntu, replying on browser requests
If you are on Windows, what I would suggest is using Unicode UTF-16 at the Windows boundary.
It seems to me that on Windows with Visual C++ (at least up to VS2015) std::cout cannot output UTF-8-encoded-text, but std::wcout correctly outputs UTF-16-encoded text.
This compilable code snippet correctly outputs your string containing German characters:
#include <fcntl.h>
#include <io.h>
#include <iostream>
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
// ü : U+00FC
// ß : U+00DF
const wchar_t * text = L"mit freundlichen Gr\u00FC\u00DFen";
std::wcout << text << L'\n';
}
Note the use of a UTF-16-encoded wchar_t string.
On a more general note, I would suggest you using the UTF-8 encoding (and for example storing text in std::strings) in your cross-platform C++ portions of code, and convert to UTF-16-encoded text at the Windows boundary.
To convert between UTF-8 and UTF-16 you can use Windows APIs like MultiByteToWideChar and WideCharToMultiByte. These are C APIs, that can be safely and conveniently wrapped in C++ code (more details can be found in this MSDN article, and you can find compilable C++ code here on GitHub).
On my system the following produces the correct output. Try it on your system. I am confident that it will produce similar results.
#include <string>
#include <iostream>
using namespace std;
int main()
{
string s="mit freundlichen Grüßen";
cout << s << endl;
return 0;
}
If it is ok, then this points to the web transfer not being 8-bit clean.
Mike.
containing unicode characters
You forgot to specify which unicode encoding does the string contain. There is the "narrow" UTF-8, which can be stored in a std::string and printed using std::cout, as well as wider variants, which can't. It is crucial to know which encoding you're dealing with. For the remainder of my answer, I'm going to assume you want to use UTF-8.
When I want to print this string with:
cout << UnicodeString << endl;
EDIT:
Important to note, the application is a CGI running on Ubuntu, replying on browser requests
The concerns here are slightly different from printing onto a terminal.
You need to set the Content-Type response header appropriately or else the client cannot know how to interpret the response. For example Content-Type: application/json; charset=utf-8.
You still need to make sure that the source string is in fact the correct encoding corresponding to the header. See the old answer below for overview.
The browser has to support the encoding. Most modern browsers have had support for UTF-8 a long time now.
Answer regarding printing to terminal:
Assuming that
UnicodeString indeed contains an UTF-8 encoded string
and that the terminal uses UTF-8 encoding
and the font that the terminal uses has the graphemes that you use
the above should work.
it outputs:
mit freundlichen Gr??en
Then it appears that at least one of the above assumptions don't hold.
Whether 1. is true, you can verify by inspecting the numeric value of each code unit separately and comparing it to what you would expect of UTF-8. If 1. isn't true, then you need to figure out what encoding does the string actually use, and either convert the encoding, or configure the terminal to use that encoding.
The terminal typically, but not necessarily, uses the system native encoding. The first step of figuring out what encoding your terminal / system uses is to figure out what terminal / system you are using in the first place. The details are probably in a manual.
If the terminal doesn't use UTF-8, then you need to convert the UFT-8 string within your program into the character encoding that the terminal does use - unless that encoding doesn't have the graphemes that you want to print. Unfortunately, the standard library doesn't provide arbitrary character encoding conversion support (there is some support for converting between narrow and wide unicode, but even that support is deprecated). You can find the unicode standard here, although I would like to point out that using an existing conversion implementation can save a lot of work.
In the case the character encoding of the terminal doesn't have the needed grapehemes - or if you don't want to implement encoding conversion - is to re-configure the terminal to use UTF-8. If the terminal / system can be configured to use UTF-8, there should be details in the manual.
You should be able to test if the font itself has the required graphemes simply by typing the characters into the terminal and see if they show as they should - although, this test will also fail if the terminal encoding does not have the graphemes, so check that first. Manual of your terminal should explain how to change the font, should it be necessary. That said, I would expect üß to exist in most fonts.

Printing smiley face c++

I'm trying to print out the smiley face (from ascii) based on the amount of times the user asks for it, but on the console output screen, it only shows a square with another one inside of it. Where have I gone wrong?
#include <iostream>
using namespace std;
int main()
{
int smile;
cout << "How many smiley faces do you want to see? ";
cin >> smile;
for (int i = 0; i < smile; i++)
{
cout << static_cast<char>(1) << "\t";
}
cout << endl;
return 0;
}
ASCII does not have smileys (so in ASCII you'll have :-) and you expect your reader to understand that as a smiley). But Unicode has several ones, e.g. ☺ (white smiling face, U+263A); see http://unicodeemoticons.com/ or http://www.unicode.org/emoji/charts/emoji-list.html for a nice table of them.
In 2017, it is reasonable to use UTF8 everywhere (in terminals & outputs). UTF-8 is a very common encoding for Unicode, and many Unicode characters are encoded in several bytes in UTF-8.
So in a terminal using UTF8, with a font with many characters available, since ☺ is UTF8 encoded as "\342\230\272", use:
for (int i = 0; i < smile; i++)
{
cout << "\342\230\272" << "\t";
}
In 2017, most "console" are terminal emulators because real terminals -like the mythical VT100- are today in museums, and you can at least configure these terminal emulators to use UTF-8 encoding. On many operating systems (notably most Linux distributions and MacOSX), they are using UTF-8 by default.
If your C++11 compiler accepts UTF8 in strings (and a UTF8 source file), as most do today, you could even have "☺" in your source code. To type that you'll often use some copy and paste technique from an outside source. On my Linux system I often use some Character Map utility (e.g. run charmap in a terminal) to get them.
In ASCII, the character of code 1 is a control character, the Start Of Heading. Perhaps you are confusing ASCII with CP437 which is no more used (but in 1980s encoded a smiley-thing at code 1).
You need to use Unicode and understand it. Today, in 2017, you cannot afford using other encodings (they are historical legacy for museums) externally. Of course if you use weird characters, you should document that the user of your program should use some font having them (but most common fonts used in terminal emulators accept a very wide part of Unicode, so that is not a problem in practice). However, on my Linux computers, many fonts are lacking U+1F642 Slightly Smiling Face (e.g. "\360\267\231\202" in a C++ program) which appeared only in Unicode7.0 in 2014.
Just do this in Visual Studio Code:
for print;
cout<<"\2";

Is it possible to cout an EM DASH on Linux and Windows? [duplicate]

This question already has answers here:
Output Unicode to console Using C++, in Windows
(5 answers)
Closed 7 years ago.
I haven't been able to find a way to cout a '—' character, whether I put that in the cout statement like this: cout << "—"; or use char(151), the program prints out a fuzzy undefined character. Do you guys see anything wrong with my code? Is couting a EM DASH even possible?
Edit: I've also tried wcout << L"—"; and std::wcout << wchar_t(0x2014);. Those both print nothing in my terminal.
First of all, EM DASH is an unicode character (just making sure you do know that).
Printing unicode characters depends on what you're printing to.
If you're printing to a Unix terminal (or an emulator), the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do what you just did above in your source code cout << "—";
If you're getting fuzzy undefined characters, it is possible that your terminal just doesn't support that character.
If you're in windows (where it is harder), you can do something like this (which is not portable):
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"—";
}
There's no universal support for Unicode in C++ and in various terminals, so there won't be a portable solution.
The thing is that the Windows console uses codepages in console by default. It probably uses UTF-16 internally but will always convert to and from the current ANSI codepage when interacting with outside. So simply printing an UTF-16 code point like std::wcout << wchar_t(0x2014); won't work without any prior setup. You need to switch to UTF-8 by running chcp 65001 in the console or _setmode(_fileno(stdout), _O_U16TEXT); in code before printing the character out with
std::wcout << L"—";
It will not always work because of the worse Unicode support in Windows console. In many cases the characters don't appear due issues in the renderer or font, replacing with squares or ????. But in that case just copy the text out and paste to any Unicode text box then it will be displayed properly
If you're using Windows in English or some other Western European languages that use codepage 1252/ISO-8859-1 then you can print em-dash which is at the codepoint 151 simply by
cout << (char)151;
If it doesn't work then you're not on codepage 1252. You can change it to 1252 if possible or look up for em-dash in your codepage (if available)
On Linux things are much simpler because UTF-8 are used by default. So you can output the string as normal without resorting to std::wcout
std::cout << "—"; // need to make sure that std::string is in UTF-8
// or use std::cout << u8"—" to force the encoding
In fact you'll often get surprise results if you use wide strings on Linux. std::wcout << L"—" won't often work because of some possible bugs in libc
That said, Windows 10 console now supports UTF-8 perfectly and even allows to use UTF-8 as the locale so if you don't need to support Windows 7 then there's a universal method to print any Unicode strings:
std::cout << u8"—";

Printing UTF8 characters to linux console using C++

I am trying to get the unicode character macron (U+00AF), i.e., an overscore, to print consistently on various linux consoles. So far, some consoles work (e.g., putty ssh), others do not (e.g., ubuntu shell), and I have not been able to figure out what I am doing right in one case (probably luck) and wrong in the other.
I do know the basics of Unicode and Utf8, but I have not been able to figure out how to consistently get consoles to display the appropriate characters.
Any suggestions? Note that this is explicitly for unix consoles - all of the similar questions I have found focused on Windows-specific console commands.
Here is what I would effectively like to get working:
wchar_t post = L'¯'; //0xC2AF
std::wcout << post << std::endl;
Unfortunately nothing I tried or could find in the way of suggestions consistently displayed the appropriate character, so I ended up using an ASCII hyphen '-' as a close enough match.
The solution is to put it into stream as a multicharacter string:
std::string s = "\uC2AF";
std::cout << s << std::endl;
or to set a locale using
char* setlocale( int category, const char* locale);
function:
std::locale old_locale; // current locale
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t w = 0xC2AF;
std::wcout << w << std::endl;
setlocale(LC_ALL, old_locale.name().c_str()); // restore locale
The final result is however dependent on many user settings (console, fonts, etc.), so there is no guarantee that it will be OK.

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?
#include <iostream>
using namespace std;
int main()
{
wcout << L"こんにちは世界\n";
wcout << L"Hello World\n"
system("pause");
}
In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.
This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.
#include <iostream>
int main() {
std::cout << "こんにちは世界\n";
}
This works fine on any system where:
The compiler's source and execution encodings include the characters.
The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
A font with the appropriate characters is available (usually not a problem).
Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.
wcout << L"こんにちは世界\n";
In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.
So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.
You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.
For example:
wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));
SetConsoleOutputCP(CP_UTF8);
wcout << L"こんにちは世界\n";
Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.
There are a few options:
Avoid the standard library entirely:
DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);
Use non-standard magical incantation that will break standard code:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"こんにちは世界\n";
After setting this mode std::cout << "Hello, World"; will crash.
Use a low level IO API along with manual conversion:
#include <codecvt>
#include <locale>
SetConsoleOutputCP(CP_UTF8);
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::puts(convert.to_bytes(L"こんにちは世界\n"));
Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.
You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.
There's a whole article about dealing with Unicode in Windows console
http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
Basically, you may implement you own streambuf for std::cout (or std::wcout) in terms of WriteConsoleW and enjoy writing UTF-8 (or whatever Unicode you want) to Windows console without depending on locales, console code pages and even without using wide characters.
It may not look very straightforward, but it's convenient and reusable solution, which is also able to give you a portable utf8-everywhere style user code. Please, don't beat me for my English :)
Or you can change Windows locale to Japanese.