Using ifstream when filename contains wide characters - c++

Using C++Builder XE5 (bcc32) in Windows 7.
I am trying to open a file whose filename contains a wide character. The actual filename I'm testing with is C:\bΛx\foo.txt. The non-ASCII character there is U+039B .
I have this filename stored correctly in a std::wstring. However, trying:
std::ifstream f( filename.c_str() );
fails to open the file.
Of course, in Standard C++ fopen only takes char *. However, the Dinkumware C++ RTL implementation has an overload accepting wchar_t *. Unfortunately the implementation of that overload in ...\Embarcadero\RAD Studio\12.0\source\cpprtl\Source\dinkumware\source\fiopen.cpp does not call _wfopen. Instead it uses wcstombs to convert the string to UTF-8 and then calls fopen.
Checking the source for fopen, it calls the narrow version of an underlying function ___topen which ultimately passes the UTF-8 string to CreateFile.
When I inspect the attempt to open the file using Sysinternals Process Monitor, it shows that it did attempt to open the file with a UTF-8 file string, and the operating system rejected this with the result NAME COLLISION.
If I open the file using _wfopen( filename.c_str(), L"r" ) then all is well and I can read the file using C I/O functions, but I can't use C++ iostreams of course.
Is there any way to use std::ifstream to open a file with U+039B or other such characters in the filename?
Note that using std::wifstream doesn't work either (it still tries to open the UTF-8 version of the filename).

If I open the file using _wfopen( filename.c_str(), L"r" ) then all is well and I can read the file using C I/O functions, but I can't use C++ iostreams of course.
I don't see that "of course". Your problem is reduced to making an iostreams streambuf from a FILE*. Howard Hinnant answered here that there's no method provided by the Standard, but implementing a streambuf-derived class on top of FILE* is pretty straightforward. He even mentions some code that he feels would be a good starting point.
Note that this only makes sense for a text file. iostreams and binary files do not get along; there's a character encoding layer and ios_base::binary does not turn that off.

Related

C++ UTF-8/ASCII to UTF-16 in MFC

How can I convert a (text) file from UTF-8/ASCII to UTF-16 before it will be displaying in a MFC program?
Because MFC uses 16 bits per character and the most (text) files on windows use UTF-8 or ASCII.
The simple answer is called MultiByteToWideChar and WideCharToMultiByte to do the reverse conversion. There's also CW2A and CA2W that are a little simpler to use.
However, I would strongly recommand against using these functions directly. You have the pain of handling character buffers manually with the risk of creating memory corruption or security holes.
It's much better to use a library based on std::string and/or iterators. For example, utf8cpp. This one has the advantage to be small, header-only and multiplatform.
Actually, you can do it very simply, using the CStdioFile and CString classes provided by MFC. The MFC library is a very powerful and comprehensive one (albeit notwithstanding some major oddities, and even bugs); but, if you're already using it, then use it to its fullest extent:
...
const wchar_t* inpPath = L"<path>\\InpFile.txt"; // These values are given just...
const wchar_t* outPath = L"<path>\\outFile.txt"; // ... for illustrative purposes!
CStdioFile inpFile(inpPath, CFile::modeRead | CFile::typeText);
CStdioFile outFile(outPath, CFile::modeWrite | CFile::modeCreate | CFile::typeText
| CFile::typeUnicode); // Note the Unicode flag - will create UTF-16LE file!
CString textBuff;
while (inpFile.ReadString(textBuff)) {
outFile.WriteString(textBuff);
outFile.WriteString(L"\n");
}
inpFile.Close();
outFile.Close();
...
Of course, you will need to change the code (a bit) if you want the input and output files to have the same path, but that wouldn't mean changing the basic premise!
With this approach, there is no concern for any library calls to convert character strings - just let MFC do it for you, when it's reading/writing it's (Unicode) CString object!
Note: Compiled and tested with MSVC (VS-2019), 64-bit, in Unicode mode.
EDIT: Maybe I misunderstood your question! If you don't want to actually convert the file, but just display the contents, then take away all references in my code to outFile and just do stuff with each textBuffer object you read. The CString class takes care of all the required ASCII/UTF-8/UTF-16LE conversions.

C++ open file for writing only if does not exists

I would like to open a file for writing with the standard library, but the file open should fail if the file already exists.
From what I can read in the documentation, ofstream::open only allows appending or truncating.
I could of course try to open for reading to check if the file exists, and reopen for writing if it doesn't, but there is no guarantee that the file will not be created by another process inbetween.
Could someone confirm this is not possible in C++ with the standard library (std::iostream) or with the C functions (FILE* functions)
Since C11 (and thus also in C++17), for fopen you can use mode "x" — exclusive mode, see this:
File access mode flag "x" can optionally be appended to "w" or "w+"
specifiers. This flag forces the function to fail if the file exists,
instead of overwriting it.
There are no fstream ways of doing this, but std::fopen is as much C++ as std::sin.
If you absolutely must have an fstream object of this file and you need the atomic check, you should first call fopen then on success, fclose and fstream::open:
std::ofstream create_new_file_for_writing()
{
FILE* fp = nullptr;
std::string fname;
do {
fname = random_file_name();
fp = fopen(fname.c_str(), "wx");
} while(!fp);
// here the file is created and you "own" the filename
fclose(fp);
return std::ostream(fname);
}
In std::ofstream by itself, no. Opening a file for writing always creates a new file if it does not already exist. There is no option to change that behavior. Opening a file for reading fails if the file does not exist.
However, on Windows at least, the Win32 API CreateFile() function has a CREATE_NEW flag that fails to open the file if it already exists. On other platforms, there may be flags available for _fsopen() and fopen() do accomplish the same thing.
It is possible to attach a FILE* to a std::ofstream (or maybe this is just a Microsoft extension, I am not sure), and in Visual C++ a FILE* can be created for a HANDLE returned by CreateFile() by using _open_osfhandle() with _fdopen(). See this question for examples:
Can I use CreateFile, but force the handle into a std::ofstream?
Other compilers/platforms may provide similar extensions for initializing an std::ofstream, you will have to look around.

Convert istream to FILE*

Is it possible to convert an istream like std::cin to a FILE *? A cross-platform solution would be a plus.
EX: (FILE *)std::cin.
No, there is no standard way to obtain a FILE* from an IOStreams stream, nor vice versa.
std::cin is usually bound to file descriptor 1 (or in FILE * form, stdin).
You could just use that. Other than that, the only way to do so is either determine the file descriptor (unlikely) or the filename and use fopen to get a FP
There is no easy way using FILE* I would advise you to use fstream instead.
std::ifstream in("in.txt");
std::cin.rdbuf(in.rdbuf());
This way your redirect cin to your input file stream.
Though it is possible that your IOStreams implementation may be implemented using a FILE*, the standard does not provide any way of you accessing this information. However, it is of my knowledge that libstdc++ has a non-standard extension __gnu_cxx::stdio_filebuf, which is a wrapper around a FILE*. You can use its file() method to return a pointer to the file.
Note that this class is non-portable and non-standard. I think you're better off writing your own stream buffer that emulates its behavior.
If you have a GNU userland (just about guaranteed on Linux), take a look at fopencookie(). That allows you to adapt any source / sink to a FILE*.
The linked man-page contains in-depth guidance on how to write the adaptor.

What does it mean to open an output file as both input and output?

I see code like this sometimes:
ofstream of("out.txt", ofstream::in | ofstream::out);
Why is it being opened as both input and output? What sense does it make to do that?
Sometimes one needs to read a file, do some processing, then update the original file with some new information. Opening the file for both input and output allows one to do that without closing and re-opening.
It also makes explicit one's intentions with regards to the file.
EDIT: My original answer didn't take into account that the OP's example uses ofstream instead of fstream. So …
Using ofstream with std::in doesn't make any sense. Are there reads performed on the stream? If not, perhaps the code when originally written used an fstream, which was later changed to an ofstream. It's the kind of thing that can creep into a code base that's in maintenance mode.
Since you're using an ofstream, none of the input functions are available to you on the stream. Then opening the file in read-write mode isn't harmful, but it is pointless unless you're going to start hacking about with the underlying stream.
It's probably boilerplate, copy/pasted from some tutorial on the internet without the author actually understanding it.
This means nothing for the actual streams. The std::ios_base::in openmode will simply be ignored. But for the underlying stream buffers, it does mean something to open as both input and output: For example, file stream and string stream buffers allow putback functionality, but it can only be used if the buffers' std::ios_base::openmode specifies output.

Force encoding when writing txt file with ofstream

I writing a txt file using ofstream, from various reasons the file should have local encoding and not UTF8.
The machine which process the file has different localizations then the target local.
is there a way to force the encoding when writing a file?
regards,
Ilan
You can call std::ios::imbue on your ofstream object to modify the locale. This won't affect the global locale.
std::ofstream os("output.txt");
std::locale mylocale("");
os.imbue(mylocale);
os << 1.5f << std::endl;
os.close();
Pay attention to the argument of std::locale constructor, it is implementation dependant. For example, the German locale could be :
std::locale mylocale("de_DE");
or
std::locale mylocale("German");
Well, given that it's Windows, you'd not have UTF8 anyway. But exactly what are you writing? Usually, you have a std::string in memory and write that to disk. The only difference is that \n in memory is translated to CR/LF (\r\n) on disk. That's the same translation everywhere.
You might encounter a situation where you're writing a std::wstring. In that case, it's determined by the locale. The default locale is the C locale, aka std::locale("C") orstd::locale::classic(). The local encoding (which you seem to want) isstd::locale("")`.
Other locales exist; see here