I am fairly new to programming, but it seems like the π(pi) symbol is not in the standard set of outputs that ASCII handles.
I am wondering if there is a way to get the console to output the π symbol, so as to express exact answers regarding certain mathematical formulas.
I'm not really sure about any other methods (such as those that use the STL) but you can do this with Win32 using WriteConsoleW:
HANDLE hConsoleOutput = GetStdHandle(STD_OUTPUT_HANDLE);
LPCWSTR lpPiString = L"\u03C0";
DWORD dwNumberOfCharsWritten;
WriteConsoleW(hConsoleOutput, lpPiString, 1, &dwNumberOfCharsWritten, NULL);
The Microsoft CRT is not very Unicode-savvy, so it may be necessary to bypass it and use WriteConsole() directly. I'm assuming you already compile for Unicode, else you need to explicitly use WriteConsoleW()
I'm in the learning phase of this, so correct me if I get something wrong.
It seems like this is a three step process:
Use wide versions of cout, cin, string and so on. So: wcout, wcin, wstring
Before using a stream set it to an Unicode-friendly mode.
Configure the targeted console to use an Unicode-capable font.
You should now be able to rock those funky åäös.
Example:
#include <iostream>
#include <string>
#include <io.h>
// We only need one mode definition in this example, but it and several other
// reside in the header file fcntl.h.
#define _O_WTEXT 0x10000 /* file mode is UTF16 (translated) */
// Possibly useful if we want UTF-8
//#define _O_U8TEXT 0x40000 /* file mode is UTF8 no BOM (translated) */
void main(void)
{
// To be able to write UFT-16 to stdout.
_setmode(_fileno(stdout), _O_WTEXT);
// To be able to read UTF-16 from stdin.
_setmode(_fileno(stdin), _O_WTEXT);
wchar_t* hallå = L"Hallå, värld!";
std::wcout << hallå << std::endl;
// It's all Greek to me. Go UU!
std::wstring etabetapi = L"η β π";
std::wcout << etabetapi << std::endl;
std::wstring myInput;
std::wcin >> myInput;
std:: wcout << myInput << L" has " << myInput.length() << L" characters." << std::endl;
// This character won't show using Consolas or Lucida Console
std::wcout << L"♔" << std::endl;
}
Related
I've been trying to write a C++ application for a project and I ran into this issue. Basically:
class OBSClass
{
public:
wstring ClassName;
uint8_t Credit;
uint8_t Level;
OBSClass() : ClassName(), Credit(), Level() {}
OBSClass(wstring name, uint8_t credit, uint8_t hyear)
: ClassName(name), Credit(credit), Level(hyear)
{}
};
In some other file:
vector<OBSClass> AllClasses;
...
AllClasses.push_back(OBSClass(L"Bilişim Sistemleri Mühendisliğine Giriş", 3, 1));
AllClasses.push_back(OBSClass(L"İş Sağlığı ve Güvenliği", 3, 1));
AllClasses.push_back(OBSClass(L"Türk Dili 1", 2, 1));
... (rest omitted, some of entries have non-ASCII characters like 'ş' and 'İ')
I have a function basically outputs everything in AllClasses, the problem is wcout does not output as desired.
void PrintClasses()
{
for (size_t i = 0; i < AllClasses.size(); i++)
{
wcout << "Class: " << AllClasses[i].ClassName << "\n";
}
}
Output is 'Class: Bili' and nothing else. Program does not even tries to output other entries and just hangs. I am on windows using G++ 6.3.0. And I am not using Windows' cmd, I am using bash from mingw, so encoding will not be problem (or isn't it?). Any advice?
Edit: Also source code encoding is not a problem, just checked it is UTF8, default of VSCode
Edit: Also just checked to find out if problem is with string literals.
wstring test;
wcin >> test;
wcout << test;
Entered some non-ASCII characters like 'ö' and 'ş', it works perfectly. What is the problem with wide string literals?
Edit: Here you go
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<wstring> testvec;
int main()
{
testvec.push_back(L"Bilişim Sistemleri Mühendisliğine Giriş");
testvec.push_back(L"ıiÖöUuÜü");
testvec.push_back(L"☺☻♥♦♣♠•◘○");
for (size_t i = 0; i < testvec.size(); i++)
wcout << testvec[i] << "\n";
return 0;
}
Compile with G++:
g++ file.cc -O3
This code only outputs 'Bili'. It must be something with the g++ screwing up binary encoding (?), since entering values with wcin then outputting them with wcout does not generate any problem.
The following code works for me, using MinGW-w64 7.3.0 in both MSYS2 Bash, and Windows CMD; and with the source encoded as UTF-8:
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
int main()
{
std::ios_base::sync_with_stdio(false);
std::locale utf8( std::locale(), new std::codecvt_utf8_utf16<wchar_t> );
std::wcout.imbue(utf8);
std::wstring w(L"Bilişim Sistemleri Mühendisliğine Giriş");
std::wcout << w << '\n';
}
Explanation:
The Windows console doesn't support any sort of 16-bit output; it's only ANSI and a partial UTF-8 support. So you need to configure wcout to convert the output to UTF-8. This is the default for backwards compatibility purposes, though Windows 10 1803 does add an option to set that to UTF-8 (ref).
imbue with a codecvt_utf8_utf16 achieves this; however you also need to disable sync_with_stdio otherwise the stream doesn't even use the facet, it just defers to stdout which has a similar problem.
For writing to other files, I found the same technique works to write UTF-8. For writing a UTF-16 file you need to imbue the wofstream with a UTF-16 facet, see example here, and manually write a BOM.
Commentary: Many people just avoid trying to use wide iostreams completely, due to these issues.
You can write a UTF-8 file using a narrow stream; and have function calls in your code to convert wstring to UTF-8, if you are using wstring internally; you can of course use UTF-8 internally.
Of course you can also write a UTF-16 file using a narrow stream, just not with operator<< from a wstring.
If you have at least Windows 10 1903 (May 2019), and at least
Windows Terminal 0.3.2142 (Aug 2019). Then set Unicode:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"OEMCP"="65001"
and restart. After that you can use this:
#include <iostream>
int main() {
std::string a[] = {
"Bilişim Sistemleri Mühendisliğine Giriş",
"Türk Dili 1",
"İş Sağlığı ve Güvenliği",
"ıiÖöUuÜü",
"☺☻♥♦♣♠•◘○"
};
for (auto s: a) {
std::cout << s << std::endl;
}
}
I'm trying to output UTF-8 encoded string with cout with no success. I'd like to use Boost.Locale in my program. I've found some info regarding windows console specific. For example, this article http://www.boost.org/doc/libs/1_60_0/libs/locale/doc/html/running_examples_under_windows.html says that I should set output console code page to 65001 and save all my sources in UTF-8 encoding with BOM. So, here is my simple example:
#include <windows.h>
#include <boost/locale.hpp>
using namespace std;
using namespace boost::locale;
int wmain(int argc, const wchar_t* argv[])
{
//system("chcp 65001 > nul"); // It's the same as SetConsoleOutputCP(CP_UTF8)
SetConsoleOutputCP(CP_UTF8);
locale::global(generator().generate(""));
static const char* utf8_string = u8"♣☻▼►♀♂☼";
cout << "cout: " << utf8_string << endl;
printf("printf: %s\n", utf8_string);
return 0;
}
I compile it with Visual Studio 2015 and it produces the following output in console:
cout: ���������������������
printf: ♣☻▼►♀♂☼
Why does printf do it well and cout don't? Can locale generator of Boost help with it? Or should I use somethong other to print UTF-8 text in console in stream mode (cout-like approach)?
It looks like std::cout is much too clever here: it tries to interpret your utf8 encoded string as an ascii one and finds 21 non ascii characters that it outputs as the unmapped character �. AFAIK Windows C++ console driver,insists on each character from a narrow char string being mapped to a position on screen and does not support multi bytes character sets.
Here what happens under the hood:
utf8_string is the following char array (just look at a Unicode table and do the utf8 conversion):
utf8_string = { '0xe2', '0x99', '0xa3', '0xe2', '0x98', '0xbb', '0xe2', '0x96',
'0xbc', '0xe2', '0x96', '0xba', '0xe2', '0x99', '0x80', '0xe2', '0x99',
'0x82', '0xe2', '0x98', '0xbc', '\0' };
that is 21 characters none of which is in the ascii range 0-0x7f.
On the opposite side, printf just outputs the byte without any conversion giving the correct output.
I'm sorry but even after many searches I could not find an easy way to correctly display UTF8 output on a windows console using a narrow stream such as std::cout.
But you should notice that your code fails to imbue the booster locale into cout
The key problem is that implementation of cout << "some string" after long and painful adventures calls WriteFile for every character.
If you'd like to debug it, set breakpoint inside _write function in write.c file of CRT sources, write something to cout and you'll see all the story.
So we can rewrite your code
static const char* utf8_string = u8"♣☻▼►♀♂☼";
cout << utf8_string << endl;
with equivalent (and faster!) one:
static const char* utf8_string = u8"♣☻▼►♀♂☼";
const size_t utf8_string_len = strlen(utf8_string);
DWORD written = 0;
for(size_t i = 0; i < utf8_string_len; ++i)
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), utf8_string + i, 1, &written, NULL);
output: ���������������������
Replace cycle with single call of WriteFile and UTF-8 console gets brilliant:
static const char* utf8_string = u8"♣☻▼►♀♂☼";
const size_t utf8_string_len = strlen(utf8_string);
DWORD written = 0;
WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), utf8_string, utf8_string_len, &written, NULL);
output: ♣☻▼►♀♂☼
I tested it on msvc.2013 and msvc.net (2003), both of them behave identically.
Obviously windows implementation of console wants a whole characters at a call of WriteFile/WriteConsole and cannot take a UTF-8 characters by single bytes. :)
What we can do here?
My first idea is to make output buffered, like in files. It's easy:
static char cout_buff[128];
cout.rdbuf()->pubsetbuf(cout_buff, sizeof(cout_buff));
cout << utf8_string << endl; // works
cout << utf8_string << endl; // do nothing
output: ♣☻▼►♀♂☼ (only once, I explain it later)
First issue is console output become delayed, it waits until end of line or buffer overflow.
Second issue — it doesn't work.
Why? After first buffer flush (at first << endl) cout switch to bad state (badbit set). That's because of WriteFile normally returns in *lpNumberOfBytesWritten number of written bytes, but for UTF-8 console it returns number of written characters (problem described here). CRT detects, that number of bytes requested to write and written is different and stops writing to 'failed' stream.
What we can do more?
Well, I suppose that we can implement our own std::basic_streambuf to write console correct way, but it's not easy and I have no time for it. If anyone want, I'll be glad.
Another decisions are (a) use std::wcout and strings of wchar_t characters, (b) use WriteFile/WriteConsole. Sometimes that solutions can be accepted.
Working with UTF-8 console in Microsoft versions of C++ is really horrible.
Consider the following code snippet, compiled as a Console Application on MS Visual Studio 2010/2012 and executed on Win7:
#include "stdafx.h"
#include <iostream>
#include <string>
const std::wstring test = L"hello\xf021test!";
int _tmain(int argc, _TCHAR* argv[])
{
std::wcout << test << std::endl;
std::wcout << L"This doesn't print either" << std::endl;
return 0;
}
The first wcout statement outputs "hello" (instead of something like "hello?test!")
The second wcout statement outputs nothing.
It's as if 0xf021 (and other?) Unicode characters cause wcout to fail.
This particular Unicode character, 0xf021 (encoded as UTF-16), is part of the "Private Use Area" in the Basic Multilingual Plane. I've noticed that Windows Console applications do not have extensive support for Unicode characters, but typically each character is at least represented by a default character (e.g. "?"), even if there is no support for rendering a particular glyph.
What is causing the wcout stream to choke? Is there a way to reset it after it enters this state?
wcout, or to be precise, a wfilebuf instance it uses internally, converts wide characters to narrow characters, then writes those to the file (in your case, to stdout). The conversion is performed by the codecvt facet in the stream's locale; by default, that just does wctomb_s, converting to the system default ANSI codepage, aka CP_ACP.
Apparently, character '\xf021' is not representable in the default codepage configured on your system. So the conversion fails, and failbit is set in the stream. Once failbit is set, all subsequent calls fail immediately.
I do not know of any way to get wcout to successfully print arbitrary Unicode characters to console. wprintf works though, with a little tweak:
#include <fcntl.h>
#include <io.h>
#include <string>
const std::wstring test = L"hello\xf021test!";
int _tmain(int argc, _TCHAR* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(test.c_str());
return 0;
}
Setting the mode for stdout to _O_U16TEXT will allow you to write Unicode characters to the wcout stream as well as wprintf. (See Conventional wisdom is retarded, aka What the ##%&* is _O_U16TEXT?) This is the right way to make this work.
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"hello\xf021test!" << std::endl;
std::wcout << L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd" << std::endl;
std::wcout << L"Now this prints!" << std::endl;
It shouldn't be necessary anymore but you can reset a stream that has entered an error state by calling clear:
if (std::wcout.fail())
{
std::wcout.clear();
}
I have a problem with wide stream output. My primary concern is wofstream but wcout doesn't work properly either.
So it doesn't produce output besides Latin characters.
That is
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
wstring wstr = L"Андрей";
wofstream fout(L"C:\\Work\\report.htm");
wcout << wstr << L"Привет мир";
fout << wstr << L"Привет мир";
fout.close();
}
Produces no output, the file stays 0 byte long.
Mixing like wcout<<L"zuhщзг" prints just "zuh", ignores the rest.
I use MVS 2013 with Intel C++ Composer 14.0
EDIT:
Windows Unicode C++ Stream Output Failure describes similar problem. But I don't quite understand how the solution works.
MVS/Windows use UTF-16 for wide strings. and I would like they to be written in the file, as is, that is utf-16, without any unnecessary conversion
For a console application, I need to display the symbol: √
When I try to simply output it using:
std::cout << '√' << std::endl; or
std::wcout << '√' << std::endl;,
it outputs the number 14846106 instead.
I've tried searching for an answer and have found several recommendations of the following:
std::cout << "\xFB" << std::endl; and
std::cout << (unsigned char)251 << std::endl;
which both display a superscript 1.
This is using the Windows console with Lucida font. I've tried this with various character pages and always get the same superscript 1. When I try to find its value through getchar() or cin, the symbol is converted into the capital letter V. I am, however, sure that it can display this character simply by pasting it in. Is there an easy way of displaying Unicode characters?
Actually "\xFB" or (unsigned char)251 are the same and do correspond to the root symbol √... but not in the Lucida font and other typefaces ASCII table , where it is an ¹ (superscript 1).
Switching to Unicode with the STL is a possibility, but I doubt it will run on Windows...
#include <iostream>
#include <locale.h>
int main() {
std::locale::global(std::locale("en_US.UTF8"));
std::wcout.imbue(std::locale());
wchar_t root = L'√';
std::wcout << root << std::endl;
return 0;
}
Since this will not satisfy you, here a portable Unicode library: http://site.icu-project.org/