How to read a single char with std::wifstream? [duplicate] - c++

You wouldn't imagine something as basic as opening a file using the C++ standard library for a Windows application was tricky ... but it appears to be. By Unicode here I mean UTF-8, but I can convert to UTF-16 or whatever, the point is getting an ofstream instance from a Unicode filename. Before I hack up my own solution, is there a preferred route here ? Especially a cross-platform one ?

The C++ standard library is not Unicode-aware. char and wchar_t are not required to be Unicode encodings.
On Windows, wchar_t is UTF-16, but there's no direct support for UTF-8 filenames in the standard library (the char datatype is not Unicode on Windows)
With MSVC (and thus the Microsoft STL), a constructor for filestreams is provided which takes a const wchar_t* filename, allowing you to create the stream as:
wchar_t const name[] = L"filename.txt";
std::fstream file(name);
However, this overload is not specified by the C++11 standard (it only guarantees the presence of the char based version). It is also not present on alternative STL implementations like GCC's libstdc++ for MinGW(-w64), as of version g++ 4.8.x.
Note that just like char on Windows is not UTF8, on other OS'es wchar_t may not be UTF16. So overall, this isn't likely to be portable. Opening a stream given a wchar_t filename isn't defined according to the standard, and specifying the filename in chars may be difficult because the encoding used by char varies between OS'es.

Since C++17, there is a cross-platform way to open an std::fstream with a Unicode filename using the std::filesystem::path overload. Example:
std::ofstream out(std::filesystem::path(u8"こんにちは"));
out << "hello";

The current versions of Visual C++ the std::basic_fstream have an open() method that take a wchar_t* according to http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx.

Use std::wofstream, std::wifstream and std::wfstream. They accept unicode filename. File name has to be wstring, array of wchar_ts, or it has to have _T() macro, or prefix Lbefore the text.

Have a look at Boost.Nowide:
#include <boost/nowide/fstream.hpp>
#include <boost/nowide/cout.hpp>
using boost::nowide::ifstream;
using boost::nowide::cout;
// #include <fstream>
// #include <iostream>
// using std::ifstream;
// using std::cout;
#include <string>
int main() {
ifstream f("UTF-8 (e.g. ß).txt");
std::string line;
std::getline(f, line);
cout << "UTF-8 content: " << line;
}

Use
wfstream
instead of
fstream
and
wofstream
instead of
ofstream
and so on...
You can find this information in the iosfwd header file.

If you're using Qt mixed with std::ifstream:
return std::wstring(reinterpret_cast<const wchar_t*>(qString.utf16()));
Note that the std::basic_ifstream constructor normally doesn't accept a const w_char*, but on in the MS implementation of STL it does. With other implementations you would probably call qString.utf8(), and use the const char* ctor.

Related

Is it possible to unify std::wstring behavior in VSVC and GCC?

Here a little code that reads a line from UFT-8 file:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
When I build it with the Visual Studio Visual C++ compiler, I got the next result:
test τεστ тест
as expected.
By when I use MinGW with the GCC compiler, I got
琀攀猀琀 쐃딃쌃쐃 䈄㔄䄄䈄
As you understand, it's not the expected result.
Does any simple way exist to fix the output for GCC to the expected string?
OR
Does any simple way exist to use UTF-8 for both MSVC and GCC?
Answer (thanks for Igor Tandetnik and Remy Lebeau):
Seems, we must specify endian mode explicitly, because MSVC and GCC have different defaults. So
new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>()
should be used.
Fixed code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
For your second question, one option is to limit the use of utf16 and std::w-prefixed stuff to the cases when you need to exchange utf16-encoded strings with the operating system. This happens when you receive arguments in wmain, open file with _wfopen, call Windows API function, etc. Otherwise, you would store, get from the user and return to the user utf8 strings using char type (char*, std::string, etc). Conversion between utf8 and utf16 can be done with MultiByteToWideChar and WideCharToMultiByte, bypassing the retarded c++ encoding api. The place where this does not work well is console input/output. Overall, you can output utf8 to the console if the user sets chcp 65001 and a ttf font. At least in Windows 7, you will also have to make sure not to split a character between two write calls, otherwise it will not print correctly (this also implies you cannot use std::cout, because msvcrt will call putc for every byte separately, and you'll need to use puts, fprintf, etc instead); I heard that this was fixed in Windows 10, but cannot confirm. Reading utf8 from the console with file api does not work as far as I know; if you want that, you'd need to detect that stdin is attached to a console and use console api instead.

Error 'LC_TYPE' was not declared in this scope

I'm writing program in c++ that is supposed to change letters in text to uppercase letters(Program works, but setlocale is not working). But it is giving me Error. [Error] 'LC_TYPE' was not declared in this scope. It "should" work because it is from my official faculty literature.
#include <iostream>
#include <string>
using namespace std;
int main() {
cout << "Write something: " << endl;
string tekst; //tekst=text
getline(cin, tekst);
setlocale(LC_TYPE, "croatian"); // here is problem...
for (char znak : tekst){ //znak=char, symbol...
char velikoSlovo = toupper(znak); // velikoSlovo=uppercaseLetter
cout << velikoSlovo;
}
cout << endl;
return 0;
}
Anyone knows how to fix this??
I'm using Orwell Dev C++ 5.9.2. Language standard (-std) is ISO C++ 11.
Here is picture.
Don't you need to include #include <clocale> as it is said here
Edit:
Actually #include <locale.h> should be preferred to <clocale> to reduce portability issues. Thanks to #Cheers for mentioning it in the comments.
As the documentation says, you should use LC_CTYPE:
http://www.cplusplus.com/reference/clocale/setlocale/
Also you need to provide valid locale code, as described here
So for chroatian, your line should look like:
setlocale(LC_CTYPE, 'hr_HR.UTF-8'); // or just "hr_HR"
or you can just use:
setlocale(LC_ALL, "")
which should set the localization to default locale used by your computer.
And as suggested before, you may also need to add #include <clocale>
Add
#include <locale.h>
to get a declaration of setlocale and its associated constants.
In general just read the relevant documentation of whatever function or associated things the compiler doesn't seem to recognize.
In general the chosen approach may work with single-byte oriented encodings, but not with multibyte encodings such as UTF-8. Single-byte encodings are commonly used in Windows. In Unix-land it's UTF-8 that rules.
And for Windows, you generally want the system's default locale, so instead of
setlocale(LC_TYPE, "croatian");
you may (but is not necessarily, but may with a good chance) be better served by
setlocale(LC_ALL, "");
where the empty string selects the system locale rather than the default pure ASCII "C" locale.
Also, note that toupper from the C library requires a non-negative argument, or else the special value EOF. You can just cast the argument to unsigned char, when you know that it's not EOF.

How to write UTF-8 file with fprintf in C++

I am programming (just occassionally) in C++ with VisualStudio and MFC. I write a file with fopen and fprintf. The file should be encoded in UTF8. Is there any possibility to do this? Whatever I try, the file is either double byte unicode or ISO-8859-2 (latin2) encoded.
Glanebridge
You shouldn't need to set your locale or set any special modes on the file if you just want to use fprintf. You simply have to use UTF-8 encoded strings.
#include <cstdio>
#include <codecvt>
int main() {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
std::string utf8_string = convert.to_bytes(L"кошка 日本国");
if(FILE *f = fopen("tmp","w"))
fprintf(f,"%s\n",utf8_string.c_str());
}
Save the program as UTF-8 with signature or UTF-16 (i.e. don't use UTF-8 without signature, otherwise VS won't produce the right string literal). The file written by the program will contain the UTF-8 version of that string. Or you can do:
int main() {
if(FILE *f = fopen("tmp","w"))
fprintf(f,"%s\n","кошка 日本国");
}
In this case you must save the file as UTF-8 without signature, because you want the compiler to think the source encoding is the same as the execution encoding... This is a bit of a hack that relies on the compiler's, IMO, broken behavior.
You can do basically the same thing with any of the other APIs for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Because the C runtime and/or the console is a bit broken you can only write UTF-8 directly to the console by doing SetConsoleOutputCP(65001) and then using one of the puts variety of function.
If you want to use wide characters instead of narrow characters then locale based methods and setting modes on file descriptors could come into play.
#include <cstdio>
#include <fcntl.h>
#include <io.h>
int main() {
if(FILE *f = fopen("tmp","w")) {
_setmode(_fileno(f), _O_U8TEXT);
fwprintf(f,L"%s\n",L"кошка 日本国");
}
}
#include <fstream>
#include <codecvt>
int main() {
if(auto f = std::wofstream("tmp")) {
f.imbue(std::locale(std::locale(),
new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16
f << L"кошка 日本国\n";
}
}
Yes, but you need Visual Studio 2005 or later. You can then call fopen with the parameters:
LPCTSTR strText = "абв";
FILE *f = fopen(pszFilePath, "w,ccs=UTF-8");
_ftprintf(f, _T("%s"), (LPCTSTR) strText);
Keep in mind this is Microsoft extension, it probably won't work with gcc or other compilers.
In theory, you should simply set a locale which uses UTF-8 as external encoding. My understanding -- I'm not a Windows programmer -- is that Windows has no such locale, so you have to resort to implementation specific means or non standard libraries (link from Dave's comment).

How to open an std::fstream (ofstream or ifstream) with a unicode filename?

You wouldn't imagine something as basic as opening a file using the C++ standard library for a Windows application was tricky ... but it appears to be. By Unicode here I mean UTF-8, but I can convert to UTF-16 or whatever, the point is getting an ofstream instance from a Unicode filename. Before I hack up my own solution, is there a preferred route here ? Especially a cross-platform one ?
The C++ standard library is not Unicode-aware. char and wchar_t are not required to be Unicode encodings.
On Windows, wchar_t is UTF-16, but there's no direct support for UTF-8 filenames in the standard library (the char datatype is not Unicode on Windows)
With MSVC (and thus the Microsoft STL), a constructor for filestreams is provided which takes a const wchar_t* filename, allowing you to create the stream as:
wchar_t const name[] = L"filename.txt";
std::fstream file(name);
However, this overload is not specified by the C++11 standard (it only guarantees the presence of the char based version). It is also not present on alternative STL implementations like GCC's libstdc++ for MinGW(-w64), as of version g++ 4.8.x.
Note that just like char on Windows is not UTF8, on other OS'es wchar_t may not be UTF16. So overall, this isn't likely to be portable. Opening a stream given a wchar_t filename isn't defined according to the standard, and specifying the filename in chars may be difficult because the encoding used by char varies between OS'es.
Since C++17, there is a cross-platform way to open an std::fstream with a Unicode filename using the std::filesystem::path overload. Example:
std::ofstream out(std::filesystem::path(u8"こんにちは"));
out << "hello";
The current versions of Visual C++ the std::basic_fstream have an open() method that take a wchar_t* according to http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx.
Use std::wofstream, std::wifstream and std::wfstream. They accept unicode filename. File name has to be wstring, array of wchar_ts, or it has to have _T() macro, or prefix Lbefore the text.
Have a look at Boost.Nowide:
#include <boost/nowide/fstream.hpp>
#include <boost/nowide/cout.hpp>
using boost::nowide::ifstream;
using boost::nowide::cout;
// #include <fstream>
// #include <iostream>
// using std::ifstream;
// using std::cout;
#include <string>
int main() {
ifstream f("UTF-8 (e.g. ß).txt");
std::string line;
std::getline(f, line);
cout << "UTF-8 content: " << line;
}
Use
wfstream
instead of
fstream
and
wofstream
instead of
ofstream
and so on...
You can find this information in the iosfwd header file.
If you're using Qt mixed with std::ifstream:
return std::wstring(reinterpret_cast<const wchar_t*>(qString.utf16()));
Note that the std::basic_ifstream constructor normally doesn't accept a const w_char*, but on in the MS implementation of STL it does. With other implementations you would probably call qString.utf8(), and use the const char* ctor.

mixing C and C++ file operations

I am writing a file splitting program, to assist with using large files with iPod notes. I want to use tmpfile() in cstdio but it returns a file* not an fstream object. I know it's not possible in standard C++ but does anyone know any libraries that work well with the standard that have the ability to convert a FILE* to an std::fstream object? Or, if not is tmpfile() functionality available in the standard, or another library?
Thanks!
My OS is Windows XP and I use either Dev-C++ 4.9.9.2 or MS Visual Studio 2008 as my compiler.
If all you want is a temporary file, use tmpnam() instead. That returns char* name that can be used for a temporary file, so just open a fstream object with that name.
Something like:
#include <cstdio>
#include <fstream>
...
char name[L_tmpnam];
tmpnam(name);
//also could be:
//char *name;
//name = tmpnam(NULL);
std::fstream file(name);
You do have to delete the file yourself, though, using remove() or some other method.
You can use the benefits of c++ streams by pumping your data via the << syntax into a std::stringstream
and later write it the .str().c_str() you get from it via the the C-API to the FILE*.
#include <sstream>
#include <cstdio>
#include <string>
using namespace std;
int main()
{
stringstream ss;
ss << "log start" << endl;
// ... more logging
FILE* f_log = fopen("bar.log", "w");
string logStr = ss.str();
fwrite(logStr.c_str(), sizeof(char), logStr.size(), f_log);
fclose(f_log);
return 0;
}
Even if you manage to convert a FILE* to an std::fstream, that won't work as advertised. The FILE object returned by tmpfile() has a special property that, when close()'d (or when the program terminates), the file is automatically removed from the filesystem. I don't know how to replicate the same behavior with std::fstream.
You could use tmpnam mktmp to obtain a temporary file name, open it with a stream and then delete it with remove.
char *name;
ifstream stream;
name = mktmp("filename");
stream.open(name);
//do stuff with stream here.
remove(name);//delete file.
Instead of using std::fstream, you could write a simple wrapper class around FILE*, which closes it on destruction. Should be quite easy. Define operators like << as necessary.
Be sure to disallow copying, to avoid multiple close() calls.
g++ has __gnu_cxx::stdio_filebuf and __gnu_cxx::stdio_sync_filebuf, in ext/stdio_filebuf.h and ext/stdio_sync_filebuf.h. It should be straight-forward to extract them from libstdc++ if your compiler is not g++.