Unable to convert the CStringW to CStringA - c++

I am working on one project where I have stucked on one problem of converting CStringW to CStringA for multibyte string like Japanese Language.
I am loading the string from string resources using LoadString() Method.
I have tried following code but it does not seem to work.
CStringW csTest;
csTest.LoadString(JAPANESE_STRING);
CStringA Msg = CStringA(csTest); // Msg has been returned blank string
And
std::string Msg = CW2A(csTest);// Msg has been returned blank string
I have also tried
wcstombs() too.
Can anyone tell me how I can convert CStringW to CString?
Thanks in advance.

CStringW stores Unicode UTF-16 strings.
What encoding do you expect for your CStringA?
Do you want UTF-8?
In this case, you can do:
// strUtf16 is a CStringW.
// Convert from UTF-16 to UTF-8
CStringA strUtf8 = CW2A(strUtf16, CP_UTF8);
Talking about CStringA without specifying an encoding doesn't make sense.
The second parameter of CW2A is a what is passed to WideCharToMultiByte() Win32 API as CodePage (note that CW2A is essentially a convenient safe C++ RAII wrapper around this API). If you follow this API documentation, you can find several "code page" values (i.e. encodings).

Related

Converting C++ std::wstring to utf8 with std::codecvt_xxx

C++11 has tools to convert wide char strings std::wstring from/to utf8 representation: std::codecvt, std::codecvt_utf8, std::codecvt_utf8_utf16 etc.
Which one is usable by Windows app to convert regular wide char Windows strings std::wstring to utf8 std::string? Is it always works without configuring locales?
Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring is not a format, it just defines a data type.
Now usually when one says "Unicode", one means UTF16 which is what Microsoft Windows uses, and that is usuasly what wstring contains.
So, the right way to convert from UTF8 to UTF16:
std::string utf8String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::wstring utf16String = convert.from_bytes( utf8String );
And the other way around:
std::wstring utf16String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::string utf8String = convert.to_bytes( utf16String );
And to add to the confusion:
When you use std::string on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.
Also, note that wstring is not exactly the same as UTF-16.
When compiling in Unicode the windows API commands expect these formats:
CommandA - multibyte - ANSI
CommandW - Unicode - UTF16
Seems that std::codecvt_utf8 works well for conversion std::wstring -> utf8. It passed all my tests. (Windows app, Visual Studio 2015, Windows 8 with EN locale)
I needed a way to convert filenames to UTF8. Therefore my test is about filenames.
In my app I use boost::filesystem::path 1.60.0 to deal with file path. It works well, but not able to convert filenames to UTF8 properly.
Internally Windows version of boost::filesystem::path uses std::wstring to store the file path. Unfortunately, build-in conversion to std::string works bad.
Test case:
create file with mixed symbols c:\test\皀皁皂皃的 (some random Asian symbols)
scan dir with boost::filesystem::directory_iterator, get boost::filesystem::path for the file
convert it to the std::string via build-in conversion filenamePath.string()
you get c:\test\?????. Asian symbols converted to '?'. Not good.
boost::filesystem uses std::codecvt internally. It doesn't work for conversion std::wstring -> std::string.
Instead of build-in boost::filesystem::path conversion you can define conversion function as this (original snippet):
std::string utf8_to_wstring(const std::wstring & str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.to_bytes(str);
}
Then you can convert filepath to UTF8 easily: utf8_to_wstring(filenamePath.wstring()). It works perfectly.
It works for any filepath. I tested ASCII strings c:\test\test_file, Asian strings c:\test\皀皁皂皃的, Russian strings c:\test\абвгд, mixed strings c:\test\test_皀皁皂皃的, c:\test\test_абвгд, c:\test\test_皀皁皂皃的_абвгд. For every string I receive valid UTF8 representation.

Convert wide CString to char*

There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.
You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().
an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);

How do I convert wchar_t* to string?

I am new to C++.
And I am trying to convert wchar_t* to string.
I cannot use wstring in condition.
I have code below:
wchar_t *wide = L"中文";
wstring ret = wstring( wide );
string str2( ret.begin(), ret.end() );
But str2 returns some strange characters.
Where do I have to fix it?
You're trying to do it backwards. Instead of truncating wide characters to chars (which is very lossy), expand your chars to wide characters.
That is, transform your std::string into an std::wstring and concatenate the two std::wstrings.
I'm not sure what platform you're targeting. If you're on Windows platform you can call WideCharToMultiByte API function. Refer to MSDN for documentation.
If you're on Linux, I think you can use libiconv functions, try google.
Of course there is a port of libiconv for Windows.
In general this is a quite complex topic for a new beginners if you know nothing about character encodings - there are a lot of background knowledge to have to learn.

Loading Win32 resource file as wstringstream

This a follow up of the question asked and answered here. I want to use a text file as a resource and then load it as a stringstream so that I can parse it.
The following code shows what I currently have:
std::string filename("resource.txt");
HRSRC hrsrc = FindResource(GetModuleHandle(NULL), filename.c_str(), RT_RCDATA);
HGLOBAL res = LoadResource(GetModuleHandle(NULL), hrsrc);
LPBYTE data = (LPBYTE)LockResource(res);
std::stringstream stream((LPSTR)data);
However, I am unsure of how to extend this to read a unicode text file using a wstringstream. The naive approach yields unreadable characters:
...
LPBYTE data = (LPBYTE)LockResource(res);
std::wstringstream wstream((LPWSTR)data);
Since LPBYTE is nothing more than a CHAR*, it is no surprise that this doesn't work, but naively converting the resource to a WCHAR* (LPWSTR) does not work either:
...
LPWSTR data = (LPWSTR)LockResource(res);
std::wstringstream wstream(data);
I am guessing this is because a WCHAR is 16-bit instead of 8-bit like a CHAR, but I'm not sure how to work around this.
Thanks for any help!
Your comment supplies the key missing detail. The file that you compiled into a resource is encoded as UTF-8. So the obvious options are:
Using the code in your question, get a pointer to the resource, encoded as UTF-8, and pass that MultiByteToWideChar to convert to UTF-16. Which you can then put into a wstring.
Convert the file that you compile into a resource to UTF-16, before you compile the resource. Then the code in your question will work.

Storing and retrieving UTF-8 strings from Windows resource (RC) files

I created an RC file which contains a string table, I would like to use some special
characters: ö ü ó ú ő ű á é. so I save the string with UTF-8 encoding.
But when I call in my cpp file, something like this:
LoadString("hu.dll", 12, nn, MAX_PATH);
I get a weird result:
How do I solve this problem?
As others have pointed out in the comments, the Windows APIs do not provide direct support for UTF-8 encoded text. You cannot pass the MessageBox function UTF-8 encoded strings and get the output that you expect. It will, instead, interpret them as characters in your local code page.
To get a UTF-8 string to pass to the Windows API functions (including MessageBox), you need to use the MultiByteToWideChar function to convert from UTF-8 to UTF-16 (what Windows calls Unicode, or wide strings). Passing the CP_UTF8 flag for the first parameter is the magic that enables this conversion. Example:
std::wstring ConvertUTF8ToUTF16String(const char* pszUtf8String)
{
// Determine the size required for the destination buffer.
const int length = MultiByteToWideChar(CP_UTF8,
0, // no flags required
pszUtf8String,
-1, // automatically determine length
nullptr,
0);
// Allocate a buffer of the appropriate length.
std::wstring utf16String(length, L'\0');
// Call the function again to do the conversion.
if (!MultiByteToWideChar(CP_UTF8,
0,
pszUtf8String,
-1,
&utf16String[0],
length))
{
// Uh-oh! Something went wrong.
// Handle the failure condition, perhaps by throwing an exception.
// Call the GetLastError() function for additional error information.
throw std::runtime_error("The MultiByteToWideChar function failed");
}
// Return the converted UTF-16 string.
return utf16String;
}
Then, once you have a wide string, you will explicitly call the wide-string variant of the MessageBox function, MessageBoxW.
However, if you only need to support Windows and not other platforms that use UTF-8 everywhere, you will probably have a much easier time sticking exclusively with UTF-16 encoded strings. This is the native Unicode encoding that Windows uses, and you can pass these types of strings directly to any of the Windows API functions. See my answer here to learn more about the interaction between Windows API functions and strings. I recommend the same thing to you as I did to the other guy:
Stick with wchar_t and std::wstring for your characters and strings, respectively.
Always call the W variants of Windows API functions, including LoadStringW and MessageBoxW.
Ensure that the UNICODE and _UNICODE macros are defined either before you include any of the Windows headers or in your project's build settings.