c++ UTF-16 ofstream file creation Windows [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to open an std::fstream (ofstream or ifstream) with a unicode filename?
I have a string encoded in UTF-16 and I want to create a file, where the name of the file would be this string.
UTF-16LE string looks like:
At first I want to make sure that system sees and displays correctly this name.
I try:
char *output=some address (points to memory where line is held)
ofstream out(output);
out.close();
On output I don't get proper name.
It looks like:
For creating of the highlighted file I appended UTF-16LE mark, not highlighted file was created using just raw UTF-16 line - none of approaches works.
Are there some ways to create files with UTF-16LE names in Windows using only C++ functionality without WinApi (CreateFilew)?
My compiler is MinGW 4.0.4, Windows XP (but I want it to be working on all Windows)
Thanks in advance for any tips!

Thanks you all guys, but it seems that C++ streams are helpless in this case (at least I got such opinion).
So I used WinApi:
#ifndef WIN32 // for Linux
ofstream out(output);
out.close();
#else // for Windows
LPWSTR lp=(LPWSTR )output;
CreateFileW(lp,GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ |
FILE_SHARE_WRITE, NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL );
#endif
And I got an output file with a correct name:
Thanks again!

Related

CStdioFile problems with encoding on read file

I can't read a file correctly using CStdioFile.
I open notepad.exe, I type àèìòùáéíóú and I save twice, once I set codification as ANSI (really is CP-1252) and other as UTF-8.
Then I try to read it from MFC with the following block of code
BOOL ReadAllFileContent(const CString &FilePath, CString *fileContent)
{
CString sLine;
BOOL isSuccess = false;
CStdioFile input;
isSuccess = input.Open(FilePath, CFile::modeRead);
if (isSuccess) {
while (input.ReadString(sLine)) {
fileContent->Append(sLine);
}
input.Close();
}
return isSuccess;
}
When I call it, with ANSI file I've got the expected result àèìòùáéíóú
but when I try to read the UTF8 encoded file I've got à èìòùáéíóú
I would like my function works with all files regardless of the encoding.
Why I need to implement?
.EDIT.
Unfortunately, in the real app, files come from external app so change the file encoding isn't an option.I must be able to read both UTF-8 and CP-1252 files.
Any file is valid ANSI, what notepad told ANSI is really Windows-1252 encode.
I've figured out a way to read UTF-8 and CP-1252 right based on the example provided here. Although it works, I need to pass the file encode which I don't know in advance.
Thnks!
I personally use the class as advertised here:
https://www.codeproject.com/Articles/7958/CTextFileDocument
It has excellent support for reading and writing text files of various encodings including unicode in its various flavours.
I have not had a problem with it.

Windows usage of char * functions with UTF-16

I port one application from Linux to Windows.
On Linux I use libmagic library from which I wouldn't be glad to rid of on Windows.
The problem is that I need pass name of file that is held in UTF-16 encoding to such function:
int magic_load(magic_t cookie, const char *filename);
Unfortunately it accepts only const char *filename. My first idea was to convert UTF-16 string to local encoding, but there are some problems - like string can contain e.g. Chinese symbols and local encoding may be Russian.
As result we will get trash on the output and program will not reach its aim.
Converting into UTF-8 doesn't help either, because this is Windows and Windows holds file name in UTF-16.
But I somehow need make that function able to open file with Unicode name.
I came only to one very very bad solution:
1. I have a filename
2. I can copy file with unicode name to file with ASCII name like "1.mp3"
3. open it with libmagic functions and get what I want
4. remove temporarily file
But I understand how this solution is bad and how it could make my application slower, so I wonder, perhaps are there some better ways to do it?
Thanks in advance for any tips, 'cause I'm really confused with it.
Use 8.3 file names to access the files.
In addition to long file names up to 255 characters in length, Windows also generates an MS-DOS-compatible (short) file name in 8.3 format.
http://support.microsoft.com/kb/142982

c++ fstreams open file with utf-16 name

At first I built my project on Linux and it was built around streams.
When I started moving to Windows I ran into some problems.
I have a name of the file that I want to open in UTF-16 encoding.
I try to do it using fstream:
QString source; // content of source is shown on image
char *op= (char *) source.data();
fstream stream(op, std::ios::in | std::ios::binary);
But file cannot be opened.
When I check it,
if(!stream.is_open())
{} // I always get that it's not opened. But file indeed exists.
I tried to do it with wstream. But result is the same, because wstream accepts only char * too. As I understand it's so , because string , that is sent as char * , is truncated after the first zero and only one symbol of the file's name is sent, so file is never found. I know wfstream in Vissual studio can accept wchar_t * line as name, but compiler of my choice is MinGW and it doesn't have such signature for wstring constructor.
Is there any way to do it with STL streams?
ADDITION
That string can contaion not only Ascii symbols, it can contain Russian, German, Chinese symbols simultaneously. I don't want limit myself only to ASCII or local encoding.
NEXT ADDITION
Also data can be different, not only ASCII, otherwise I wouldn't bother myself with Unicode at all.
E.g.
Thanks in advance!
Boost::Filesystem especially the fstream.hpp header may help.
If you are using MSVC and it's implementation of the c++ standard library, something like this should work:
QString source; // content of source is shown on image
wchar_t *op= source.data();
fstream stream(op, std::ios::in | std::ios::binary);
This works because the Microsoft c++ implementation has an extension to allow fstream to be opened with a wide character string.
Convert the UTF-16 string using WideCharToMultiByte with CP_ACP before passing the filename to fstream.

FStream reading a binary file written with Delphi's binary writer

im creating a dll in MS Visual Studio 2010 Express which loads a binary data file (*.mgr extension -> used exclusively in my company's applications) using fstream library in C++. The file is created with an app developed by someone else in my company who is using Delphi. He says the first 15 bytes should be some characters which indicate the date the file was created and some other stuff like version of the app:
"XXXX 2012".
The result after loading with fstream (in binary mode) and writing another file with fstream (string mode) is as follows:
"[] X X X X 2 0 1 2"
The first char is an unknown char (rectangle) then there are spaces between each char. Finally it is 31 bytes wide. 15 for actual chars + 15 for white spaces + 1 for the rect char = 31.
Some other information:
I'm using C++, the app developer is using Delphi.
Im using fstream. he is using BW.Write() function. (BW == Binary Writer?)
He uses Windows 7 whilst i use Windows XP Professional.
Can you make a diagnosis of the problem?
Thanks in advance
First Edit: I'm adding c++ code that loads those first bytes.
Firstly he is using Delphi XE2 from embarcadero Rad Studio XE2.
From what i know PChar is a null-terminated string consisting of widechars (since delphi 2009) which are 2 bytes wide as opposed to normal chars (one byte). So basically he's saving words instead of bytes.
here is the code loading the mgr:
wchar_t header[15];
DXFLIBRARY_API void loadMGR(const char* szFileName, const char* szOutput)
{
fstream file;
file.open( szFileName, ios::binary | ios::in );
if(file.is_open())
{
file.read(reinterpret_cast<char*>(header),sizeof(header));
}
file.close();
//zapis
fstream saveFile;
saveFile.open( szOutput, ios::out );
if(saveFile.is_open())
{
saveFile.write(reinterpret_cast<const char*>(header),sizeof(header));
}
saveFile.close();
}
Header contains 15 wchar_t's so we get 30 bytes. Still after investigating i have no idea how to convert.
It seems pretty clear that somewhere along the way the data is being mangled between an 8 bit text encoding and a 16 bit encoding. The spurious first character is almost certainly the UTF-16 BOM.
One possible explanation is that the Delphi developer is writing UTF-16 encoding text to the file. And presumably you are expecting an 8 bit encoding.
Another explanation is that the Delphi code is correctly writing out 8 bit text, but that your code is mangling it. Perhaps your read/write code is doing that.
Use a hex editor on the file output from the Delphi program to narrow down exactly where the mangling occurs.
In the absence of any code in the question, it's hard to be more specific than this.

fstream::open() Unicode or Non-Ascii characters don't work (with std::ios::out) on Windows

In a C++ project, I want to open a file (fstream::open()) (which seems to be a major problem). The Windows build of my program fails miserably.
File "ä" (UTF-8 0xC3 0xA4)
std::string s = ...;
//Convert s
std::fstream f;
f.open(s.c_str(), std::ios::binary | std::ios::in); //Works (f.is_open() == true)
f.close();
f.open(s.c_str(), std::ios::binary | std::ios::in | std::ios::out); //Doesn't work
The string s is UTF-8 encoded, but then converted from UTF-8 to Latin1 (0xE4). I'm using Qt, so QString::fromUtf8(s.c_str()).toLocal8Bit().constData().
Why can I open the file for reading, but not for writing?
File "и" (UTF-8 0xD0 0xB8)
Same code, doesn't work at all.
It seems, this character doesn't fit in the Windows-1252 charset. How can I open such an fstream (I'm not using MSVC, so no fstream::open(const wchar_t*, ios_base::openmode))?
In Microsoft implementations of STL, there's a non-standard extension (overload) to allow unicode support for UTF-16 encoded strings.
Just pass UTF-16 encoded std::wstring to fstream::open(). This is the only way to make it work with fstream.
You can read more on what I find to be the easiest way to support unicode on windows here: http://utf8everywhere.org/
Using the standard APIs (such as std::fstream) on Windows you can only open a file if the filename can be encoded using the currently set "ANSI Codepage" (CP_ACP).
This means that there can be files which simply cannot be opened using these APIs on Windows. Unless Microsoft implements support for setting CP_ACP to CP_UTF8 then this cannot be done using Microsoft's CRT or C++ standard library implementation.
(Windows has had a feature called "short" filenames where, when enabled, every file on the drive had an ASCII filename that can be used via standard APIs. However this feature is going away so it does not represent a viable solution.)
Update: Windows 10 has added support for setting the codepage to UTF-8