no output with wide streams - c++

I have a problem with wide stream output. My primary concern is wofstream but wcout doesn't work properly either.
So it doesn't produce output besides Latin characters.
That is
#include <string>
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
wstring wstr = L"Андрей";
wofstream fout(L"C:\\Work\\report.htm");
wcout << wstr << L"Привет мир";
fout << wstr << L"Привет мир";
fout.close();
}
Produces no output, the file stays 0 byte long.
Mixing like wcout<<L"zuhщзг" prints just "zuh", ignores the rest.
I use MVS 2013 with Intel C++ Composer 14.0
EDIT:
Windows Unicode C++ Stream Output Failure describes similar problem. But I don't quite understand how the solution works.
MVS/Windows use UTF-16 for wide strings. and I would like they to be written in the file, as is, that is utf-16, without any unnecessary conversion

Related

wcout does not output as desired

I've been trying to write a C++ application for a project and I ran into this issue. Basically:
class OBSClass
{
public:
wstring ClassName;
uint8_t Credit;
uint8_t Level;
OBSClass() : ClassName(), Credit(), Level() {}
OBSClass(wstring name, uint8_t credit, uint8_t hyear)
: ClassName(name), Credit(credit), Level(hyear)
{}
};
In some other file:
vector<OBSClass> AllClasses;
...
AllClasses.push_back(OBSClass(L"Bilişim Sistemleri Mühendisliğine Giriş", 3, 1));
AllClasses.push_back(OBSClass(L"İş Sağlığı ve Güvenliği", 3, 1));
AllClasses.push_back(OBSClass(L"Türk Dili 1", 2, 1));
... (rest omitted, some of entries have non-ASCII characters like 'ş' and 'İ')
I have a function basically outputs everything in AllClasses, the problem is wcout does not output as desired.
void PrintClasses()
{
for (size_t i = 0; i < AllClasses.size(); i++)
{
wcout << "Class: " << AllClasses[i].ClassName << "\n";
}
}
Output is 'Class: Bili' and nothing else. Program does not even tries to output other entries and just hangs. I am on windows using G++ 6.3.0. And I am not using Windows' cmd, I am using bash from mingw, so encoding will not be problem (or isn't it?). Any advice?
Edit: Also source code encoding is not a problem, just checked it is UTF8, default of VSCode
Edit: Also just checked to find out if problem is with string literals.
wstring test;
wcin >> test;
wcout << test;
Entered some non-ASCII characters like 'ö' and 'ş', it works perfectly. What is the problem with wide string literals?
Edit: Here you go
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<wstring> testvec;
int main()
{
testvec.push_back(L"Bilişim Sistemleri Mühendisliğine Giriş");
testvec.push_back(L"ıiÖöUuÜü");
testvec.push_back(L"☺☻♥♦♣♠•◘○");
for (size_t i = 0; i < testvec.size(); i++)
wcout << testvec[i] << "\n";
return 0;
}
Compile with G++:
g++ file.cc -O3
This code only outputs 'Bili'. It must be something with the g++ screwing up binary encoding (?), since entering values with wcin then outputting them with wcout does not generate any problem.
The following code works for me, using MinGW-w64 7.3.0 in both MSYS2 Bash, and Windows CMD; and with the source encoded as UTF-8:
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
int main()
{
std::ios_base::sync_with_stdio(false);
std::locale utf8( std::locale(), new std::codecvt_utf8_utf16<wchar_t> );
std::wcout.imbue(utf8);
std::wstring w(L"Bilişim Sistemleri Mühendisliğine Giriş");
std::wcout << w << '\n';
}
Explanation:
The Windows console doesn't support any sort of 16-bit output; it's only ANSI and a partial UTF-8 support. So you need to configure wcout to convert the output to UTF-8. This is the default for backwards compatibility purposes, though Windows 10 1803 does add an option to set that to UTF-8 (ref).
imbue with a codecvt_utf8_utf16 achieves this; however you also need to disable sync_with_stdio otherwise the stream doesn't even use the facet, it just defers to stdout which has a similar problem.
For writing to other files, I found the same technique works to write UTF-8. For writing a UTF-16 file you need to imbue the wofstream with a UTF-16 facet, see example here, and manually write a BOM.
Commentary: Many people just avoid trying to use wide iostreams completely, due to these issues.
You can write a UTF-8 file using a narrow stream; and have function calls in your code to convert wstring to UTF-8, if you are using wstring internally; you can of course use UTF-8 internally.
Of course you can also write a UTF-16 file using a narrow stream, just not with operator<< from a wstring.
If you have at least Windows 10 1903 (May 2019), and at least
Windows Terminal 0.3.2142 (Aug 2019). Then set Unicode:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"OEMCP"="65001"
and restart. After that you can use this:
#include <iostream>
int main() {
std::string a[] = {
"Bilişim Sistemleri Mühendisliğine Giriş",
"Türk Dili 1",
"İş Sağlığı ve Güvenliği",
"ıiÖöUuÜü",
"☺☻♥♦♣♠•◘○"
};
for (auto s: a) {
std::cout << s << std::endl;
}
}

Farsi character utf8 in c++

i m trying to read and write Farsi characters in c++ and i want to show them in CMD
first thing i fix is Font i add Farsi Character to that and now i can write on the screen for example ب (uni : $0628) with this code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
using namespace std;
int main() {
_setmode(_fileno(stdout), _O_U16TEXT);
wcout << L"\u0628 \n";
wcout << L"ب"<<endl;
system("pause");
}
but how i can keep this character ... for Latin characters we can use char or string but how about Farsi character utf8 ?!
and how i can get them ... for Latin characters we use cin>>or gets_s
should i use wchar_t? if yes how?
because with this code it show wrong character ...
wchar_t a='\u0628';
wcout <<a;
and i can't show this character بـ (uni $FE91) even though that exist in my installed font but ب (uni $0628) showed correctly
thanks in advance
The solution is the following line:
wchar_t a=L'\u0628';
The use of L tells the compiler that your type char is a wide char ("large" type, I guess? At least that's how I remember it) and this makes sure the value doesn't get truncated to 8 bits - thus this works as intended.
UPDATE
If you are building/running this as a console application in Windows you need to manage your code pages accordingly. The following code worked for me when using Cyrillic input (Windows code page 1251) when I set the proper code page before wcin and cout calls, basically at the very top of my main():
SetConsoleOutputCP(1251);
SetConsoleCP(1251);
For Farsi I'd expect you should use code page 1256.
Full test code for your reference:
#include <iostream>
#include <Windows.h>
using namespace std;
void main()
{
SetConsoleOutputCP(1256); // to manage console output
SetConsoleCP(1256); // to properly process console input
wchar_t b;
wcin >> b;
wcout << b << endl;
}

`std::wcout << L"\u25a0" << std::endl;` outputs nothing, and anything <<'d to wcout thereafter also outputs nothing [duplicate]

Consider the following code snippet, compiled as a Console Application on MS Visual Studio 2010/2012 and executed on Win7:
#include "stdafx.h"
#include <iostream>
#include <string>
const std::wstring test = L"hello\xf021test!";
int _tmain(int argc, _TCHAR* argv[])
{
std::wcout << test << std::endl;
std::wcout << L"This doesn't print either" << std::endl;
return 0;
}
The first wcout statement outputs "hello" (instead of something like "hello?test!")
The second wcout statement outputs nothing.
It's as if 0xf021 (and other?) Unicode characters cause wcout to fail.
This particular Unicode character, 0xf021 (encoded as UTF-16), is part of the "Private Use Area" in the Basic Multilingual Plane. I've noticed that Windows Console applications do not have extensive support for Unicode characters, but typically each character is at least represented by a default character (e.g. "?"), even if there is no support for rendering a particular glyph.
What is causing the wcout stream to choke? Is there a way to reset it after it enters this state?
wcout, or to be precise, a wfilebuf instance it uses internally, converts wide characters to narrow characters, then writes those to the file (in your case, to stdout). The conversion is performed by the codecvt facet in the stream's locale; by default, that just does wctomb_s, converting to the system default ANSI codepage, aka CP_ACP.
Apparently, character '\xf021' is not representable in the default codepage configured on your system. So the conversion fails, and failbit is set in the stream. Once failbit is set, all subsequent calls fail immediately.
I do not know of any way to get wcout to successfully print arbitrary Unicode characters to console. wprintf works though, with a little tweak:
#include <fcntl.h>
#include <io.h>
#include <string>
const std::wstring test = L"hello\xf021test!";
int _tmain(int argc, _TCHAR* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(test.c_str());
return 0;
}
Setting the mode for stdout to _O_U16TEXT will allow you to write Unicode characters to the wcout stream as well as wprintf. (See Conventional wisdom is retarded, aka What the ##%&* is _O_U16TEXT?) This is the right way to make this work.
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"hello\xf021test!" << std::endl;
std::wcout << L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd" << std::endl;
std::wcout << L"Now this prints!" << std::endl;
It shouldn't be necessary anymore but you can reset a stream that has entered an error state by calling clear:
if (std::wcout.fail())
{
std::wcout.clear();
}

Output unicode symbol π and ≈ in c++ win32 console application

I am fairly new to programming, but it seems like the π(pi) symbol is not in the standard set of outputs that ASCII handles.
I am wondering if there is a way to get the console to output the π symbol, so as to express exact answers regarding certain mathematical formulas.
I'm not really sure about any other methods (such as those that use the STL) but you can do this with Win32 using WriteConsoleW:
HANDLE hConsoleOutput = GetStdHandle(STD_OUTPUT_HANDLE);
LPCWSTR lpPiString = L"\u03C0";
DWORD dwNumberOfCharsWritten;
WriteConsoleW(hConsoleOutput, lpPiString, 1, &dwNumberOfCharsWritten, NULL);
The Microsoft CRT is not very Unicode-savvy, so it may be necessary to bypass it and use WriteConsole() directly. I'm assuming you already compile for Unicode, else you need to explicitly use WriteConsoleW()
I'm in the learning phase of this, so correct me if I get something wrong.
It seems like this is a three step process:
Use wide versions of cout, cin, string and so on. So: wcout, wcin, wstring
Before using a stream set it to an Unicode-friendly mode.
Configure the targeted console to use an Unicode-capable font.
You should now be able to rock those funky åäös.
Example:
#include <iostream>
#include <string>
#include <io.h>
// We only need one mode definition in this example, but it and several other
// reside in the header file fcntl.h.
#define _O_WTEXT 0x10000 /* file mode is UTF16 (translated) */
// Possibly useful if we want UTF-8
//#define _O_U8TEXT 0x40000 /* file mode is UTF8 no BOM (translated) */
void main(void)
{
// To be able to write UFT-16 to stdout.
_setmode(_fileno(stdout), _O_WTEXT);
// To be able to read UTF-16 from stdin.
_setmode(_fileno(stdin), _O_WTEXT);
wchar_t* hallå = L"Hallå, värld!";
std::wcout << hallå << std::endl;
// It's all Greek to me. Go UU!
std::wstring etabetapi = L"η β π";
std::wcout << etabetapi << std::endl;
std::wstring myInput;
std::wcin >> myInput;
std:: wcout << myInput << L" has " << myInput.length() << L" characters." << std::endl;
// This character won't show using Consolas or Lucida Console
std::wcout << L"♔" << std::endl;
}

Stumped with Unicode, Boost, C++, codecvts

In C++, I want to use Unicode to do things. So after falling down the rabbit hole of Unicode, I've managed to end up in a train wreck of confusion, headaches and locales.
But in Boost I've had the unfortunate problem of trying to use Unicode file paths and trying to use the Boost program options library with Unicode input. I've read whatever I could find on the subjects of locales, codecvts, Unicode encodings and Boost.
My current attempt to get things to work is to have a codecvt that takes a UTF-8 string and converts it to the platform's encoding (UTF-8 on POSIX, UTF-16 on Windows), I've been trying to avoid wchar_t.
The closest I've actually gotten is trying to do this with Boost.Locale, to convert from a UTF-8 string to a UTF-32 string on output.
#include <string>
#include <boost/locale.hpp>
#include <locale>
int main(void)
{
std::string data("Testing, 㤹");
std::locale fromLoc = boost::locale::generator().generate("en_US.UTF-8");
std::locale toLoc = boost::locale::generator().generate("en_US.UTF-32");
typedef std::codecvt<wchar_t, char, mbstate_t> cvtType;
cvtType const* toCvt = &std::use_facet<cvtType>(toLoc);
std::locale convLoc = std::locale(fromLoc, toCvt);
std::cout.imbue(convLoc);
std::cout << data << std::endl;
// Output is unconverted -- what?
return 0;
}
I think I had some other kind of conversion working using wide characters, but I really don't know what I'm even doing. I don't know what the right tool for the job is at this point. Help?
Okay, after a long few months I've figured it out, and I'd like to help people in the future.
First of all, the codecvt thing was the wrong way of doing it. Boost.Locale provides a simple way of converting between character sets in its boost::locale::conv namespace. Here's one example (there's others not based on locales).
#include <boost/locale.hpp>
namespace loc = boost::locale;
int main(void)
{
loc::generator gen;
std::locale blah = gen.generate("en_US.utf-32");
std::string UTF8String = "Tésting!";
// from_utf will also work with wide strings as it uses the character size
// to detect the encoding.
std::string converted = loc::conv::from_utf(UTF8String, blah);
// Outputs a UTF-32 string.
std::cout << converted << std::endl;
return 0;
}
As you can see, if you replace the "en_US.utf-32" with "" it'll output in the user's locale.
I still don't know how to make std::cout do this all the time, but the translate() function of Boost.Locale outputs in the user's locale.
As for the filesystem using UTF-8 strings cross platform, it seems that that's possible, here's a link to how to do it.
std::cout.imbue(convLoc);
std::cout << data << std::endl;
This does no conversion, since it uses codecvt<char, char, mbstate_t> which is a no-op. The only standard streams that use codecvt are file-streams. std::cout is not required to perform any conversion at all.
To force Boost.Filesystem to interpret narrow-strings as UTF-8 on windows, use boost::filesystem::imbue with a locale with a UTF-8 ↔ UTF-16 codecvt facet. Boost.Locale has an implementation of the latter.
The Boost filesystem iostream replacement classes work fine with UTF-16 when used with Visual C++.
However, they do not work (in the sense of supporting arbitrary filenames) when used with g++ in Windows - at least as of Boost version 1.47. There is a code comment explaining that; essentially, the Visual C++ standard library provides non-standard wchar_t based constructors that Boost filesystem classes make use of, but g++ does not support these extensions.
A workaround is to use 8.3 short filenames, but this solution is a bit brittle since with old Windows versions the user can turn off automatic generation of short filenames.
Example code for using Boost filesystem in Windows:
#include "CmdLineArgs.h" // CmdLineArgs
#include "throwx.h" // throwX, hopefully
#include "string_conversions.h" // ansiOrFillerFrom( wstring )
#include <boost/filesystem/fstream.hpp> // boost::filesystem::ifstream
#include <iostream> // std::cout, std::cerr, std::endl
#include <stdexcept> // std::runtime_error, std::exception
#include <string> // std::string
#include <stdlib.h> // EXIT_SUCCESS, EXIT_FAILURE
using namespace std;
namespace bfs = boost::filesystem;
inline string ansi( wstring const& ws ) { return ansiWithFillersFrom( ws ); }
int main()
{
try
{
CmdLineArgs const args;
wstring const programPath = args.at( 0 );
hopefully( args.nArgs() == 2 )
|| throwX( "Usage: " + ansi( programPath ) + " FILENAME" );
wstring const filePath = args.at( 1 );
bfs::ifstream stream( filePath ); // Nice Boost ifstream subclass.
hopefully( !stream.fail() )
|| throwX( "Failed to open file '" + ansi( filePath ) + "'" );
string line;
while( getline( stream, line ) )
{
cout << line << endl;
}
hopefully( stream.eof() )
|| throwX( "Failed to list contents of file '" + ansi( filePath ) + "'" );
return EXIT_SUCCESS;
}
catch( exception const& x )
{
cerr << "!" << x.what() << endl;
}
return EXIT_FAILURE;
}