call avio_open function with non-english filename is invalid - c++

i have been writing unicode based program with libav and i wanna make some file through libav with filename "中.mp4".
this filename is not english, and when i call, function return positive integer(not fail).
but there is "ѱ۰.mp4" instead of "中.mp4". (invalid file name.)
what's the matter?
char * szFilenameA = 0;
#ifdef _UNICODE
CSHArray<char> aFilenameBuffer;
aFilenameBuffer.Alloc(lstrlen(szFileName) * 2);
ZeroMemory(aFilenameBuffer, aFilenameBuffer.GetSize());
WideCharToMultiByte(CP_ACP, 0, szFileName, lstrlen(szFileName), aFilenameBuffer, aFilenameBuffer.GetSize(), NULL, NULL);
szFilenameA = aFilenameBuffer;
#else
szFilenameA = (TCHAR *)szFileName;
#endif
ZeroMemory(m_pOutputFormatCtx->filename,1024);
_snprintf(m_pOutputFormatCtx->filename, strlen(szFilenameA), "%s", szFilenameA);
avio_open(&m_pOutputFormatCtx->pb, szFilenameA, AVIO_FLAG_WRITE)

finally!
it's because of charset.
convert ansi filename to UTF8 and then it works fine.
int ANSIToUTF8(char *pszCode, char *UTF8code)
{
WCHAR Unicode[100]={0,};
char utf8[100]={0,};
// read char Lenth
int nUnicodeSize = MultiByteToWideChar(CP_ACP, 0, pszCode, strlen(pszCode), Unicode, sizeof(Unicode));
// read UTF-8 Lenth
int nUTF8codeSize = WideCharToMultiByte(CP_UTF8, 0, Unicode, nUnicodeSize, UTF8code, sizeof(Unicode), NULL, NULL);
// convert to UTF-8
MultiByteToWideChar(CP_UTF8, 0, utf8, nUTF8codeSize, Unicode, sizeof(Unicode));
return nUTF8codeSize;
}

Related

BSTR conversion to UTF-8

I'm working with UIAutomation and I'm struggling with the localized BSTRs. I'm in Germany, so there are some special characters that are represented funny in the BSTRs. I'm logging the information and need to have them in UTF-8 to process later on.
I tried already every version of the answers that I could find regarding to WideCharToMultiByte, but that's just converting the funny character into an even funnier one. I'd really appreciate if anyone could tell me what I'm doing wrong, it's really bugging me.
So I tried both of the following versions and got both times this result (the upper one is the converted one, the lower the original one):
The first word should be "Schaltfläche" and the second "Fünf".
My tried code:
BSTR* origin;
_bstr_t originWrapper(*origin);
char* originChar = originWrapper;
size_t len = strlen(originChar) + 1;
int room = MultiByteToWideChar(CP_ACP, 0, originChar, -1, NULL, 0);
wchar_t* unicodeString = (wchar_t*)malloc((sizeof(wchar_t))*room);
MultiByteToWideChar(CP_ACP, 0, originChar, -1, unicodeString, room);
int size_needed = WideCharToMultiByte(CP_UTF8, 0, unicodeString, -1, NULL, 0, NULL, NULL);
char* utf8Char = (char*) malloc(size_needed);
WideCharToMultiByte(CP_UTF8, 0, unicodeString, -1, utf8Char, size_needed, NULL, NULL);
and
BSTR* origin;
_bstr_t originWrapper(*origin);
int size_needed = WideCharToMultiByte(CP_UTF8, 0, originWrapper, SysStringByteLen(*origin), NULL, 0, NULL, NULL);
std::string resultingString(size_needed, 0);
WideCharToMultiByte(CP_UTF8, 0, *origin, SysStringByteLen(*origin), &resultingString[0], size_needed, NULL, NULL);
BSTR is a pointer to UTF-16 (WCHAR) character data, preceded by the string length. So, your roundtrip through narrow strings is misguided, you should straight use WideCharToMultiByte:
std::string BSTRtoUTF8(BSTR bstr) {
int len = SysStringLen(bstr);
// special case because a NULL BSTR is a valid zero-length BSTR,
// but regular string functions would balk on it
if(len == 0) return "";
int size_needed = WideCharToMultiByte(CP_UTF8, 0, bstr, len, NULL, 0, NULL, NULL);
std::string ret(size_needed, '\0');
WideCharToMultiByte(CP_UTF8, 0, bstr, len, ret.data(), ret.size(), NULL, NULL);
return ret;
}
To check the validity of the conversion don't output the result to the console, as it doesn't support UTF-8 output by default (it interprets narrow strings not even as in CP_ACP, but in CP_OEM, go figure). Instead, write the output to a file and check it with a reliable editor supporting UTF-8.

Why filename has different bytes after converting UTF16 -> UTF8 -> UTF16 in winapi?

I have next file:
I use ReadDirectoryChangesW for reading changes in current folder.
And I get path to this file: L"TEST Ӡ⬨☐.ipt":
Next, I want to convert this to utf8 and back:
std::string wstringToUtf8(const std::wstring& source) {
const int size = WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0, NULL, NULL);
std::vector<char> buffer8(size);
WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer8.data(), size, NULL, NULL);
}
std::wstring utf8ToWstring(const std::string& source) {
const int size = MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0);
std::vector<wchar_t> buffer16(size);
MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer16.data(), size);
}
int main() {
// Some code with ReadDirectoryChangesW and
// ...
// std::wstring fileName = "L"TEST Ӡ⬨☐.ipt""
// ...
std::string filenameUTF8 = wstringToUtf8(fileName);
std::wstring filename2 = utf8ToWstring(filenameUTF8);
assert(filenameUTF8 == filename2); // FAIL!
return 0;
}
But I catch assert.
filename2:
Different bits: [29]
Why?
57216 seems to fall in to surrogate pair range, used in UTF-16 to encode non-BMP code points. They need to be given in pairs, or decoding won't give you correct codepoint.
65533 is a special error character which decoder gives because other surrogate is missing.
To put it another way: Your original string is not valid UTF-16 string.
More info on Wikipedia.

Issues Converting wstring to TCHAR [duplicate]

This question already has answers here:
How to convert std::wstring to a TCHAR*?
(6 answers)
Closed 10 years ago.
I'm fairly new to programming, and I'm trying to write a program where a user inputs a date, then that date is added to the file directory name, then that file directory is searched.
Here is what I'm working with below. I have a number of functions to do this.. I've searched online and tried doing the conversion a few different ways and I'm just not understanding it.... so I left off with (what I know is incorrected) a static_cast.
Maybe I'm just not doing the conversion right... basically this will throw it back to a function that uses the WINAPI handler. Whether I can get that to work is a completely different story... Thanks in advance for any help!
wstring fDate;
wstring fileDin;
const TCHAR* s = _T (fileDin);
std::wstring(fDate);
std::wstring(fileDin) =L"Z:\\software\\A\\AC\\" + fDate;
wcout<< fileDin;
cout <<endl;
//wstring fileDin(&arc[1]);
fileDin = static_cast<TCHAR>(arc[1]);
dir(2, arc);
TCHAR can be either wchar_t (when you use Unicode) or char (when you use Multi-byte).
On the other hand std::wstring always contains characters of type wchar_t, so it's better if you use wchar_t* directly instead of TCHAR* (if possible).
Then wchar_t* to std::wstring conversion can be done by using constructor of std::wstring:
wchar_t* wcstr = L"my string";
std::wstring wstr(wcstr);
and std::wstring to wchar_t* by simple calling c_str() method:
wchar_t* wcstr = wstr.c_str();
Then sometimes you might need to convert between "wide" strings (std::wstrings holding wchar_t characaters) and multi-byte strings (std::strings holding chars). I usually use following helpers:
// multi byte to wide char:
std::wstring s2ws(const std::string& str)
{
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}
// wide char to multi byte:
std::string ws2s(const std::wstring& wstr)
{
int size_needed = WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), int(wstr.length() + 1), 0, 0, 0, 0);
std::string strTo(size_needed, 0);
WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), int(wstr.length() + 1), &strTo[0], size_needed, 0, 0);
return strTo;
}

How can I fix this code?

This code does not work as expected and I don't know what's wrong.
szTeam should change, but doesn't.
Could anyone explain this?
-----------------------------------------------------
WCHAR szTeam[MAX_PATH] = L"\u7F57\u5207\u8FBE\u5C14\u6D41\u6D6A";
char szMsg[MAX_PATH];
sprintf(szMsg , "%s" , WideStringToMultiByte(szTeam));
swprintf( szTeam , L"%s" , MultiByteToWideString(szMsg));
......
WCHAR* MultiByteToWideString(const char* szSrc)
{
int iSizeOfStr = MultiByteToWideChar(CP_ACP, 0, szSrc, -1, NULL, 0);
wchar_t* wszTgt = new wchar_t[iSizeOfStr];
if(!wszTgt)
return (NULL);
MultiByteToWideChar(CP_ACP, 0, szSrc, -1, wszTgt, iSizeOfStr);
return(wszTgt);
}
char* WideStringToMultiByte(const wchar_t* wszSrc)
{
int iSizeOfStr = WideCharToMultiByte(CP_UTF8, 0, wszSrc, -1, NULL, 0, NULL, NULL);
char* szTgt = new char[iSizeOfStr];
if(!szTgt)
return(NULL);
WideCharToMultiByte(CP_UTF8, 0, wszSrc, -1, szTgt, iSizeOfStr, NULL, NULL);
return szTgt;
}
-----------------------------------------------------
Erm, no szTeam does change. Into something unrecognizable, mojibake. You start out with "罗切达尔流浪" and convert it from its utf-16 encoding to utf-8. That works fine. The debugger won't show you anything recognizable because it neither knows nor cares that szMsg is encoded in utf-8.
You then go wrong though, you are converting that utf-8 string with CP_ACP. Which says that the string is encoded in the default system code page. It is not, it was encoded in utf-8.
Fix your problem with:
WCHAR* MultiByteToWideString(const char* szSrc)
{
int iSizeOfStr = MultiByteToWideChar(CP_UTF8, 0, szSrc, -1, NULL, 0);
// etc..
}
And now szTeam won't change because the string got properly converted back.

utf-8 to/from utf-16 problem

I based these two conversion functions and an answer on StackOverflow, but converting back-and-forth doesn't work:
std::wstring MultiByteToWideString(const char* szSrc)
{
unsigned int iSizeOfStr = MultiByteToWideChar(CP_ACP, 0, szSrc, -1, NULL, 0);
wchar_t* wszTgt = new wchar_t[iSizeOfStr];
if(!wszTgt) assert(0);
MultiByteToWideChar(CP_ACP, 0, szSrc, -1, wszTgt, iSizeOfStr);
std::wstring wstr(wszTgt);
delete(wszTgt);
return(wstr);
}
std::string WideStringToMultiByte(const wchar_t* wszSrc)
{
int iSizeOfStr = WideCharToMultiByte(CP_ACP, 0, wszSrc, -1, NULL, 0, NULL, NULL);
char* szTgt = new char[iSizeOfStr];
if(!szTgt) return(NULL);
WideCharToMultiByte(CP_ACP, 0, wszSrc, -1, szTgt, iSizeOfStr, NULL, NULL);
std::string str(szTgt);
delete(szTgt);
return(str);
}
[...]
// はてなブ in utf-16
wchar_t wTestUTF16[] = L"\u306f\u3066\u306a\u30d6\u306f\u306f";
// shows the text correctly
::MessageBoxW(NULL, wTestUTF16, L"Message", MB_OK);
// convert to UTF8, and back to UTF-16
std::string strUTF8 = WideStringToMultiByte(wTestUTF16);
std::wstring wstrUTF16 = MultiByteToWideString(strUTF8.c_str());
// this doesn't show the proper text. Should be same as first message box
::MessageBoxW(NULL, wstrUTF16.c_str(), L"Message", MB_OK);
Check the docs for WideCharToMultiByte(). CP_ACP converts using the current system code page. That's a very lossy one. You want CP_UTF8.