printing to windows console degrees (°) and cube symbol (³) - c++

I'm working on a c++ windows console program and I need to print degrees (°) and cube symbol (³).
There's tons of info on the ° and the only way that worked for me was:
cout << value << "\370 C" << endl;
Now, what terminology is this? I need the same thing for ³.
I've read somewhere that \370 is octal code, but I can not find any relevant chart with it mentioned that way, or with any equivalent for ³.

You can try something like
cout << value << (char)176 << " C" << endl;
with the number which is parsed to char being the decimal representation of an ascii sign.
³ should be 0xB3 in hexadecimal, 179 in decimal
For more, wacht this.

Make your life easier and use Unicode. Using Unicode you don't need to explicitly encode non-ASCII characters, you just include them in your source as-is. This also makes your program independent from the code page of the console, which could be different in another country.
Steps needed:
Save your source file in a Unicode encoding (UTF-8 works well, UTF-16 works too but some version control software have issues with the latter).
At the beginning of your program, call _setmode(_fileno(stdout), _O_U16TEXT) once. This switches the standard output to UTF-16 encoding. UTF-16 is the preferred Windows encoding as the OS uses it internally, so no conversion overhead will occur.
Use std::wcout instead of std::cout everywhere. Never mix both.
Always use wide (UTF-16) string literals via the L prefix.
Make sure that a console font is selected that actually includes these symbols (very likely as these are quite common).
#include <iostream>
#include <io.h>
#include <fcntl.h>
int wmain(int argc, wchar_t* argv[])
{
// Switch stdout encoding to UTF-16.
_setmode(_fileno(stdout), _O_U16TEXT);
// Output UTF-16 string literal.
std::wcout << L"°³" << std::endl;
}

Related

Is it possible to cout an EM DASH on Linux and Windows? [duplicate]

This question already has answers here:
Output Unicode to console Using C++, in Windows
(5 answers)
Closed 7 years ago.
I haven't been able to find a way to cout a '—' character, whether I put that in the cout statement like this: cout << "—"; or use char(151), the program prints out a fuzzy undefined character. Do you guys see anything wrong with my code? Is couting a EM DASH even possible?
Edit: I've also tried wcout << L"—"; and std::wcout << wchar_t(0x2014);. Those both print nothing in my terminal.
First of all, EM DASH is an unicode character (just making sure you do know that).
Printing unicode characters depends on what you're printing to.
If you're printing to a Unix terminal (or an emulator), the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do what you just did above in your source code cout << "—";
If you're getting fuzzy undefined characters, it is possible that your terminal just doesn't support that character.
If you're in windows (where it is harder), you can do something like this (which is not portable):
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"—";
}
There's no universal support for Unicode in C++ and in various terminals, so there won't be a portable solution.
The thing is that the Windows console uses codepages in console by default. It probably uses UTF-16 internally but will always convert to and from the current ANSI codepage when interacting with outside. So simply printing an UTF-16 code point like std::wcout << wchar_t(0x2014); won't work without any prior setup. You need to switch to UTF-8 by running chcp 65001 in the console or _setmode(_fileno(stdout), _O_U16TEXT); in code before printing the character out with
std::wcout << L"—";
It will not always work because of the worse Unicode support in Windows console. In many cases the characters don't appear due issues in the renderer or font, replacing with squares or ????. But in that case just copy the text out and paste to any Unicode text box then it will be displayed properly
If you're using Windows in English or some other Western European languages that use codepage 1252/ISO-8859-1 then you can print em-dash which is at the codepoint 151 simply by
cout << (char)151;
If it doesn't work then you're not on codepage 1252. You can change it to 1252 if possible or look up for em-dash in your codepage (if available)
On Linux things are much simpler because UTF-8 are used by default. So you can output the string as normal without resorting to std::wcout
std::cout << "—"; // need to make sure that std::string is in UTF-8
// or use std::cout << u8"—" to force the encoding
In fact you'll often get surprise results if you use wide strings on Linux. std::wcout << L"—" won't often work because of some possible bugs in libc
That said, Windows 10 console now supports UTF-8 perfectly and even allows to use UTF-8 as the locale so if you don't need to support Windows 7 then there's a universal method to print any Unicode strings:
std::cout << u8"—";

How can I use std::imbue to set the locale for std::wcout?

I am trying to use the std::locale mechanism in C++11 to count words in different languages. Specifically, I have std::wstringstream which contains the title of a famous Russian novel ("Crime and Punishment" in English). What I want to do is to use the appropriate locale (ru_RU.utf8 on my Linux machine) to read the stringstream, count the words and print the results. I should also probably note that my system is set to use the en_US.utf8 locale.
The desired result is this:
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
That all works when I set the global locale, but not when I attempt to imbue the wcout stream. When I try that, I get this result instead:
0: "????????????"
1: "?"
2: "?????????"
I counted 3 words.
and the last word was "?????????"
Also, when I attempt to use a solution suggested in the comments, (which can be activate by changing #define USE_CODECVT 0 to #define USE_CODECVT 1) I get the error mentioned in this other question.
Those interested in experimenting with the code, or with compiler settings or both may wish to use this live code.
My questions
Why does that not work? Is it because wcout is already open?
Is there way to use imbue rather than setting the global locale to do what I want?
If it makes a difference, I'm using g++ 4.8.3. The full code is shown below.
getwords.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <locale>
#define USE_CODECVT 0
#define USE_IMBUE 1
#if USE_CODECVT
#include <codecvt>
#endif
using namespace std;
int main()
{
#if USE_CODECVT
locale ru("ru_RU.utf8",
new codecvt_utf8<wchar_t, 0x10ffff, consume_header>{});
#else
locale ru("ru_RU.utf8");
#endif
#if USE_IMBUE
wcout.imbue(ru);
#else
locale::global(ru);
#endif
wstringstream in{L"Преступление и наказание"};
in.imbue(ru);
wstring word;
unsigned wordcount = 0;
while (in >> word) {
wcout << wordcount << ": \"" << word << "\"\n";
++wordcount;
}
wcout << "\nI counted " << wordcount << " words.\n"
<< "and the last word was \"" << word << "\"\n";
}
First I did some more test using your code and I can confirm that L"Преступление и наказание" is a correct UTF16 string. I controlled the code of the individual characters, and they are correctly 0x41f, 0x440, 0x435, 0x441, 0x442, 0x443, 0x43f, 0x43b, 0x435, 0x43d, 0x438, 0x435, 0x20, 0x438, 0x20, 0x43d, 0x430, 0x43a, 0x430, 0x437, 0x430, 0x43d, 0x438, 0x435
I could not find any reference about it, but it looks like simply calling imbue is not enough. imbue it a method from basic_ios which is an ancestor of cout and wcout. It does act on numeric conversions, but on all my tests, it has no effect on the charset used for output.
By default, the locale used in a C++ (or C) program is ... the C locale which knows nothing about unicode. All printable ASCII characters (below 128) are outputted as is, and others are replaced with a ?. It is exactly what your program does.
To make it work correctly, you have to select a locale that knows about unicode characters with setlocale. Once this is done, you can change the numeric conversion by calling imbue, and as you selected a unicode charset all will be fine.
So provided your current locale uses an UTF-8 charset, you only have to add
setlocale(LC_ALL, "");
as first line in your program, and the output will be as expected :
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
If your current locale does not use UTF-8, choose one that is installed on you system and that supports it. I used setlocale(LC_ALL, "fr_FR.UTF-8");, or even setlocale(LC_ALL, "en_US.UTF-8"); and both worked.
Edit :
In fact, the best way to correctly output unicode to screen is to use setlocale(LC_ALL, "");. It automatically adapts to the current charset. I tested with a stripped down variant using Latin1 charset (my system speaks natively french and not russian ...)
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "");
wchar_t ws[] = { 0xe8, 0xe9, 0 };
wcout << ws << endl;
}
I tried it under Linux using UTF-8 charset and ISO-8859-1 (latin1) (resp export LANG=fr_FR.UTF-8 and export LANG=fr_FR.ISO-8859-1) and I got correctly èé in the proper charset. I tried it also under Windows XP, with codepage 851 (oem) and 1252 (ansi) (resp. chcp 850 and chcp 1252 with Lucida console charset), and got èé on the console too.
Edit 2 :
Of course, you can also set a global C++ locale with locale::global(locale(""); with default locale or locale::global(locale("ru_RU.UTF-8"); with russian locale, but it is more than simply calling setlocale. According to the documentation of Gnu implementation of C++ Standard Library about locale : there is only one relation (of the C++ locale mechanism) to the C locale mechanism: the global C locale is modified if a named C++ locale object is set as the global locale", that is: std::locale::global(std::locale("")); affects the C functions as if the following call was made: std::setlocale(LC_ALL, "");. On the other hand, there is no vice versa, that is, calling setlocale has no whatsoever on the C++ locale mechanism, in particular on the working of locale("").
So it really looks like there was an underlying C library mechanizme that should be first enabled with setlocale to allow imbue conversion to work correctly.
In this answer, I'm taking the questions in reverse order, and adding another (with answer) that came up along the way.
Is there way to use imbue rather than setting the global locale to do what I want?
Yes. By default, std::wcout is synchronized to the underlying stdout C stream. So std::wcout can use imbue if that synchronization is turned off, allowing the C++ stream to operate independently. So to modify the original code to use imbue and work as intended only a single line need be added, calling std::ios_base::sync_with_stdio:
std::ios_base::sync_with_stdio(false);
std::wcout.imbue(ru);
Why didn't the original version work?
The standard (I'm referring to INCITS/ISO/IEC 14882-2011[2012]) says very little about the tie to the underlying stdio stream, but in 27.4.3 it says
The object wcout controls output to a stream buffer associated with the object stdout, declared in <cstdio>
Further, without explicitly setting a global locale, the locale is the "C" locale which is US English ASCII, so this appears to imply that stdout will, by default, have an ASCII mapping. Since no Cyrillic characters are represented in ASCII, the underlying stdout is what converts the proper Russian into a series of ? characters.
Why must the sync_with_stdio call precede imbue?
According to 27.5.3.4 of the standard:
If any input or output operation has occurred using the standard streams prior to the call,
the effect is implementation-defined. Otherwise, called with a false argument, it allows the standard streams to operate independently of the standard C streams.
I don't know what languages you're planning on supporting, but there are languages where your algorithm doesn't apply, eg. Japanese. I suggest checking out the word iterators in International Components for Unicode. http://userguide.icu-project.org/boundaryanalysis

Printing UTF8 characters to linux console using C++

I am trying to get the unicode character macron (U+00AF), i.e., an overscore, to print consistently on various linux consoles. So far, some consoles work (e.g., putty ssh), others do not (e.g., ubuntu shell), and I have not been able to figure out what I am doing right in one case (probably luck) and wrong in the other.
I do know the basics of Unicode and Utf8, but I have not been able to figure out how to consistently get consoles to display the appropriate characters.
Any suggestions? Note that this is explicitly for unix consoles - all of the similar questions I have found focused on Windows-specific console commands.
Here is what I would effectively like to get working:
wchar_t post = L'¯'; //0xC2AF
std::wcout << post << std::endl;
Unfortunately nothing I tried or could find in the way of suggestions consistently displayed the appropriate character, so I ended up using an ASCII hyphen '-' as a close enough match.
The solution is to put it into stream as a multicharacter string:
std::string s = "\uC2AF";
std::cout << s << std::endl;
or to set a locale using
char* setlocale( int category, const char* locale);
function:
std::locale old_locale; // current locale
setlocale(LC_ALL, "en_US.UTF-8");
wchar_t w = 0xC2AF;
std::wcout << w << std::endl;
setlocale(LC_ALL, old_locale.name().c_str()); // restore locale
The final result is however dependent on many user settings (console, fonts, etc.), so there is no guarantee that it will be OK.

printing Unicode characters C++

I'm trying to write a simple command line app to teach myself Japanese, but can't seem to get Unicode characters to print. What am I missing?
#include <iostream>
using namespace std;
int main()
{
wcout << L"こんにちは世界\n";
wcout << L"Hello World\n"
system("pause");
}
In this example only "Press any key to continue" is displayed. Tested on Visual C++ 2013.
This is not easy on Windows. Even when you manage to get the text to the Windows console you still need to configure cmd.exe to be able to display Japanese characters.
#include <iostream>
int main() {
std::cout << "こんにちは世界\n";
}
This works fine on any system where:
The compiler's source and execution encodings include the characters.
The output device (e.g., the console) expects text in the same encoding as the compiler's execution encoding.
A font with the appropriate characters is available (usually not a problem).
Most platforms these days use UTF-8 by default for all these encodings and so can support the entire Unicode range with code similar to the above. Unfortunately Windows is not one of these platforms.
wcout << L"こんにちは世界\n";
In this line the string literal data is (at compile time) converted from the source encoding to the execution wide encoding and then (at run time) wcout uses the locale it is imbued with to convert the wchar_t data to char data for output. Where things go wrong is that the default locale is only required to support characters from the basic source character set, which doesn't even include all ASCII characters, let alone non-ASCII characters.
So the conversion results in an error, putting wcout into a bad state. The error has to be cleared before wcout will function again, which is why the second print statement does not output anything.
You can work around this for a limited range of characters by imbuing wcout with a locale that will successfully convert the characters. Unfortunately the encoding that is needed to support the entire Unicode range this way is UTF-8; Although Microsoft's implementation of streams supports other multibyte encodings it very specifically does not support UTF-8.
For example:
wcout.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>()));
SetConsoleOutputCP(CP_UTF8);
wcout << L"こんにちは世界\n";
Here wcout will correctly convert the string to UTF-8, and if the output were written to a file instead of the console then the file would contain the correct UTF-8 data. However the Windows console, even though configured here to accept UTF-8 data, simply will not accept UTF-8 data written in this way.
There are a few options:
Avoid the standard library entirely:
DWORD n;
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), L"こんにちは世界\n", 8, &n, nullptr);
Use non-standard magical incantation that will break standard code:
#include <fcntl.h>
#include <io.h>
_setmode(_fileno(stdout), _O_U8TEXT);
std::wcout << L"こんにちは世界\n";
After setting this mode std::cout << "Hello, World"; will crash.
Use a low level IO API along with manual conversion:
#include <codecvt>
#include <locale>
SetConsoleOutputCP(CP_UTF8);
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
std::puts(convert.to_bytes(L"こんにちは世界\n"));
Using any of these methods, cmd.exe will display the correct text to the best of its ability, by which I mean it will display unreadable boxes. Seven little boxes, for the given string.
You can copy the text out of cmd.exe and into notepad.exe or whatever to see the correct glyphs.
There's a whole article about dealing with Unicode in Windows console
http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/
http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/
Basically, you may implement you own streambuf for std::cout (or std::wcout) in terms of WriteConsoleW and enjoy writing UTF-8 (or whatever Unicode you want) to Windows console without depending on locales, console code pages and even without using wide characters.
It may not look very straightforward, but it's convenient and reusable solution, which is also able to give you a portable utf8-everywhere style user code. Please, don't beat me for my English :)
Or you can change Windows locale to Japanese.

C++ spanish question mark

I am beginning developing in C++ and I am developing a simple calculator in console and when my program ask to the user if wants to exit,the character '¿' doesn't appear (The questions in spanish are between '¿' and '?')
Can someone help me?
PD: The problem only happens in Windows,not in Linux
EDIT: Here is the code that output the code:
cout << '¿' <<"Desea salir (S/N)? " ;
There are a few ways to deal with this problem.
The fundamental problem is not that the ¿ doesn't exist in the console, but that the console and your C++ text editor disagree on what that character is. The two are using different character codes for many characters beyond those needed for English. Character codes 32-126 (letters, numbers, punctuation and brackets), are universally the same. However, character codes 128 through 255, which from a Spanish point of view includes all the accented characters, "u with diaeresis" (e.g. "pingüino"), Ñ, and the starting ¿ and ¡, depend on the specific environment.
Why have such an inconvenient disagreement in character codes is a historical accident, interesting on its own but out of the scope of this question. To keep it simple: in the Windows OS, "consoles" (typically) use the list of characters described in OEM Code Page 437, while Windows applications like your C++ editor (typically) use the Windows-1252 Code Page.
There is no portable (universal) solution for this problem, because the issue of differing charsets is a platform-specific problem. Windows is unfortunately somewhat unique in that the editor and (console) outputs use different sets.
The first and simplest solution - which is fine for toy programs - is to just look up the character code that you want from the OEM 437 code-page, and use that. For ¿, that's #168 (0xa8 in hex, or \250 in octal). You can just embed the character code in the string to make clear what you're trying to do, either of these:
std::cout << ""\x0a8""Cu""\x0a0""l es el primer n""\x0a3""mero?\n"; // hex
std::cout << "\250Cu\240l es el primer n\243mero?\n"; // octal
Outputs:
¿Cuál es el primer número?
Note how I had to do the same thing with the ú and the á. Unfortunately, writing strings like this gets unwieldy quickly. using macros or const chars can help, but not much.
A second alternative is to use a Windows function such as CharToOemA. For example1:
#include <windows.h>
...
...
char pregunta[] = "¿Cuál es el primer número\n";
char *pregunta_oem = new char[sizeof(pregunta)/sizeof(char)];
CharToOemA(pregunta, pregunta_oem);
std::cout << pregunta_oem;
delete []pregunta_oem;
For a more complex program, I would wrap that pattern into a utility function or class.
A different approach is to change the Code Page of the console, so that it agrees with your C++ editor and the rest of Windows. You can do that via the CHCP console command, or via the SetConsoleOutputCP() function, but that doesn't work on the default "raster font" used by consoles, so you have to change the font as well. When the font is set to a unicode font like Lucida Console, this works:
std::cout << "¿Cuál es el primer número?\n"; // ┐Cußl es el...
UINT originalCP = GetConsoleOutputCP();
SetConsoleOutputCP(1252);
std::cout << "¿Cuál es el primer número?\n"; // ¿Cuál es el...
SetConsoleOutputCP(originalCP);
(I don't know if you can change the font from the program itself; I have to look that up. The standard way to do it from the console is to click on the tiny icon on the corner, click Properties, Font tab, and pick a font from the list).
1 I have to warn that this snippet contains a number of subtleties that can easily trip a beginner. You have to make sure the source of the text is a char array; if you're using a char pointer, sizeof won't work correctly and you have to use strlen(source)+1. For the source I used the natural option of a char array initialized to a literal, but you can't do that for the destination because the contents of such an array are read/only. If you are using a new'd char array or one that is not initialized to a literal, you can use the same char array for the source and destination. This example feels very C-like.
You can use _setmode function to do that :
#include <iostream>
#include <string>
#if defined(WIN32) && !defined(UNIX)
# include <io.h> // for _setmode()
# include <fcntl.h> // for _O_U16TEXT
#endif // WIN32 && !UNIX
int main()
{
#if defined(WIN32) && !defined(UNIX)
_setmode(_fileno(stdout), _O_U16TEXT);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#endif // WIN32 && !UNIX
std::wstring wstr = L"'¿' and '?'";
std::wcout << L"WString : " << wstr << std::endl;
system("pause");
return 0;
}
To write UNICODE chars (assuming LE is the standard Windows variant of UTF-16...) out with the iostream library, call _setmode() with _O_U16TEXT and then use wcout.
But you can't use cout anymore. It throws an assert.
Check this answer.
Assuming you are using simple call to std::cout, you should be able to print Unicode strings, if you set your command line to Unicode mode:
1. Change code page to UTF-8
You can do this by simply calling the command below in your cmd:
chcp 65001
2. Make sure you are using a font which has the characters you want to display
Lucidia Console should do the trick, as it supports ¿ (and other characters included in WGL4).
this character is simply not included in basic ascii. Try using wstring http://www.cplusplus.com/reference/string/wstring/
As you can see in Ascii table, symbol ¿ have the code 168. You can use in output stream \ddd to print some special character.
This is because the command console does not support non-ASCII characters by default (ASCII has mainly English language characters and few accented characters). To get support for characters in other character classes play around with the chcp command. Refer to it's documentation here.
In your case I think you need to run chcp 850 in the console before running your program.