This a follow up of the question asked and answered here. I want to use a text file as a resource and then load it as a stringstream so that I can parse it.
The following code shows what I currently have:
std::string filename("resource.txt");
HRSRC hrsrc = FindResource(GetModuleHandle(NULL), filename.c_str(), RT_RCDATA);
HGLOBAL res = LoadResource(GetModuleHandle(NULL), hrsrc);
LPBYTE data = (LPBYTE)LockResource(res);
std::stringstream stream((LPSTR)data);
However, I am unsure of how to extend this to read a unicode text file using a wstringstream. The naive approach yields unreadable characters:
...
LPBYTE data = (LPBYTE)LockResource(res);
std::wstringstream wstream((LPWSTR)data);
Since LPBYTE is nothing more than a CHAR*, it is no surprise that this doesn't work, but naively converting the resource to a WCHAR* (LPWSTR) does not work either:
...
LPWSTR data = (LPWSTR)LockResource(res);
std::wstringstream wstream(data);
I am guessing this is because a WCHAR is 16-bit instead of 8-bit like a CHAR, but I'm not sure how to work around this.
Thanks for any help!
Your comment supplies the key missing detail. The file that you compiled into a resource is encoded as UTF-8. So the obvious options are:
Using the code in your question, get a pointer to the resource, encoded as UTF-8, and pass that MultiByteToWideChar to convert to UTF-16. Which you can then put into a wstring.
Convert the file that you compile into a resource to UTF-16, before you compile the resource. Then the code in your question will work.
Related
There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.
You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().
an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);
Currently I am attempting to convert a QString to a LPCWSTR that will be used in URLDownloadToFile(). The following is a simple version of my current code:
QString url = "http://whatever_file...";
HRESULT hRez = URLDownloadToFile(NULL, (LPCWSTR)url.toLocal8Bit().constData(), TEXT("C:\\etc..."), 0, NULL);
The conversion was found working, in the post I found it in, with conversion of QString to LPCWSTR. I am rather new in the field of programming and I simply added a letter to that solution as URLDownloadToFile require it. It return no error however the download fail.
What am I missing here?
To get LPCWSTR from QString you can use QString::constData method, because QChar is 2 byte Unicode symbol, exactly as WCHAR (if wchar_t is 2 byte on target machine).
And I aware you from the using of "TEXT" macro in one line with "LPCWSTR". You should use "L" instead.
"TEXT" is created for using in pair with the "LPCTSTR" macro. You can read this about them.
I am creating some file open dialog and stumbled across something inconsistent in the WinAPI (hehe).
I fully understand why lpstrFile is a LPSTR as the path is written into this variable.
Fine, but why is lpstrFileTitle not LPCSTR? I've read the docs at MSDN and googled around and found no satisfying explanation as it doesn't look like it is modified in any way.
Is this a compatibility remnant or something?
Causes annoying workarounds when passing a std::string as I cannot use c_str() and resort to &str[0].
lpstrFileTitle is also an output buffer. It contains the name and extension without path information of the selected file.
Related side-note: You must set lpstrFileTitle to a valid buffer for non-Unicode builds.
The docs for OPENFILENAME state that field is ignored if the pointer is null. However, since at least VS2008 the MFC CFileDialog code has included this code:
VC\atlmfc\src\mfc\dlgfile.cpp
void CFileDialog::UpdateOFNFromShellDialog()
{
...
#ifdef UNICODE
...
#else
::WideCharToMultiByte(CP_ACP, 0, wcPathName + offset,-1, m_ofn.lpstrFileTitle, m_ofn.nMaxFileTitle, NULL, NULL);
m_ofn.lpstrFileTitle[m_ofn.nMaxFileTitle - 1] = _T('\0');
#endif
...
The Unicode support correctly handles a NULL lpstrFileTitle and the WideCharToMultiByte basically does nothing. However, the added code to safely terminate the buffer does not check for a null pointer or a nMaxFileTitle==0. The result is an access violation.
Better of course to kill multibyte apps, but if you must compile that way, you have to supply that buffer.
I am working on one project where I have stucked on one problem of converting CStringW to CStringA for multibyte string like Japanese Language.
I am loading the string from string resources using LoadString() Method.
I have tried following code but it does not seem to work.
CStringW csTest;
csTest.LoadString(JAPANESE_STRING);
CStringA Msg = CStringA(csTest); // Msg has been returned blank string
And
std::string Msg = CW2A(csTest);// Msg has been returned blank string
I have also tried
wcstombs() too.
Can anyone tell me how I can convert CStringW to CString?
Thanks in advance.
CStringW stores Unicode UTF-16 strings.
What encoding do you expect for your CStringA?
Do you want UTF-8?
In this case, you can do:
// strUtf16 is a CStringW.
// Convert from UTF-16 to UTF-8
CStringA strUtf8 = CW2A(strUtf16, CP_UTF8);
Talking about CStringA without specifying an encoding doesn't make sense.
The second parameter of CW2A is a what is passed to WideCharToMultiByte() Win32 API as CodePage (note that CW2A is essentially a convenient safe C++ RAII wrapper around this API). If you follow this API documentation, you can find several "code page" values (i.e. encodings).
I use the next code to read all of the elemnts from a file with the handle hFile that works, and with its size that I got with GetFileSize(hFile, NULL).
_TCHAR* text = (_TCHAR*)malloc(sizeOfFile * sizeof(_TCHAR));
DWORD numRead = 0;
BOOL didntFail = ReadFile(hFile, text, sizeOfFile, &numRead, NULL);
after the operation text is some strange thing in Japanese or something, and not the content of the file.
what did i do wrong?
edit:
I understand it is the encoding problem, but then how will I convert text to LPCWSTR to use stuff like WriteConsoleOutputCharacter
Modern IDEs default to Unicode applications, meaning _TCHAR is actually wchar_t. ReadFile() works with simple bytes and if you use it to fill a _TCHAR array directly, you'll get 8-bit characters interpreted as UTF-16 Unicode. These usually show as CJK (Chinese/Japanese/Korean) glyphs.
You have three options:
convert your program to non-Unicode
use a file containing Unicode text (in UTF-16 encoding), or
read from the file into a char array and then use MultiByteToWideChar() to convert the text to Unicode.
If you mix Unicode and non-Unicode be careful to calculate the correct buffer sizes (number of bytes vs. number of characters).
Note that you can still use narrow chars with Windows in your Unicode program if you call the ANSI version of the Windows function (e.g. WriteConsoleOutputCharacterA).
You got the type of the string wrong. Text from a file that was encoded in an 8-bit encoding will look like Chinese when you look at it through a character type, like TCHAR with UNICODE defined, that uses a 16-bit encoding. Fix:
char* text = (char*)malloc(...);
You do normally have to fret a lot more about the encoding that was used to write the text. It could be utf-8 for example. You can convert from the 8-bit encoding to a TCHAR (wchar_t, really) with MultiByteToWideChar(). Its first argument is the one to fret about.
You have read an ANSI or UTF-8 text file into a UTF-16 string.
wchar_t ReadBuff[1024];
memset(&ReadBuff, 0, sizeof(ReadBuff));
HANDLE hFile = CreateFile(szPathFileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD NumberOfBytesRead = 0;
ReadFile(hFile, ReadBuff, 600, &NumberOfBytesRead, NULL);
wsprintf(ReadBuff, L"%S\0", ReadBuff);
ReadBuff is now in readable form.