String encoding VB6 / C++ dll - c++

I am having a problem with some characters in 2 strings that my program uses.
String #1 is filled using VB code that gets data from a 3rd party application.
String #2 gets similar data from the same 3rd party application, but it gets it with a C++ dll and sends it to VB.
The data has some weird symbols in it.
I don't know a whole lot about encoding and different character sets, but I'll try to explain it the best I can.
I will use "Т" as my example character.
"Т" (note this isnt a normal capital t) it is unicode decimal value 1058
http://www.unicodemap.org/details/0x0422/index.html
When this character appears in String #1 during runtime it appears as "?", which I believe is just what VB6 does to show some unicode characters. When I use AscW on the character it returns the correct value of 1058.
When I output the string to a text file, it appears as "?".
The same character in String #2 from the C++ DLL appears as 2 characters "Т"
When I output that string to a text file, the character appears properly as "Т".
I was only outputting things to text files for testing purposes. I only need the 2 strings to be encoded / appear the same during run time.
Any idea whats going on here? Any way for me to get weird characters to appear the same in both strings?
Thanks
edit: also the C++ dll is in multi character set and sends the data in a BSTR string
CODE IN C++ DLL
allChat is a CString
BSTR Message;
int len = allChat.GetLength();
Message = SysAllocStringByteLen ((LPCTSTR)allChat,len+1);
Message is returned to the VB app.. and nothing happens to the string after that.
String #1 is just a regular VB string

From the way Cyrillic "T" becomes "Т", you get your string as a UTF8 encoded string (I verified that with Notepad++ by switching encodings). You need to convert it to Unicode before sending it to your VB app. Note that your VB app needs to be Unicode, not ASCII.
You can convert UTF8 to std::wstring with this function:
std::wstring utf8to16( const char* src )
{
vector<wchar_t> buffer;
buffer.resize(MultiByteToWideChar(CP_UTF8, 0, src, -1, 0, 0));
MultiByteToWideChar(CP_UTF8, 0, src, -1, &buffer[0], buffer.size());
return &buffer[0];
}

Related

Only showing one character while printing in C++

This is my code:
auto text = new wchar_t[WCHAR_MAX];
GetWindowTextW(hEdit, text, WCHAR_MAX);
SetWindowTextW(hWnd, text);
printf_s((const char *)text);
While printing, the char (text), it only outputs one character to the console.
It is a WINAPI gui and a console running together. It sets the winapi title successfully and get the text successfully, but i have no idea why this is only printing out one character to the console...
You're performing a raw cast from a wide string to a narrow string. This conversion is never safe.
Wide strings are stored as two-byte words in Windows. In your case, the high byte of the first character is 0, and x86 is little-endian, so the print stops at the first character.

QTextBrowser not displaying non-english characters

I'm developing a Qt GUI application to parse out a custom windows binary file that stores unicode text using wchar_t (default UTF-16 encoding). I've constructed a QString using QString::fromWcharArray and passed it to QTextBrowser::insertPlainText like this
wchar_t *p = ; // pointer to a wchar_t string in the binary file
QString t = QString::fromWCharArray(p);
ui.logBrowser->insertPlainText(t);
The displayed text displays ASCII characters correctly, but non-ASCII characters are displayed as a rectangular box instead. I've followed the code in a debugger and p points to a valid wchar_t string and the constructed QString t is also a valid string matching the wchar_t string. The problem happens when printing it out on a QTextBrowser.
How do I fix this?
First of all read documentation. So depending on system you will have different encoding UCS-4 or UTF-16! What is the size of wchar_t?
Secondly there is alternative API: try QString::fromUtf16.
Finally what kind of character are you using? Hebrew/Cyrillic/Japanese/???. Are you sure those characters are supported by font you are using?

Convert wide CString to char*

There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.
You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().
an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);

Firebird crashes on `UTF8 string converted to wstring`

Hi I am completely new to database. Here I am having problem inserting a row into a table. The string comes in is in Unicode and converted to UTF8 using WideCharToMultiByte call. Then constructed a database query as below.
in = "Weiß" (UTF8 string result of conversion from Unicode to UTF8)
wchar_t buf[2048]; //= new wchar_t[ in.size() ];
size_t num_chars = mbstowcs( buf, in.c_str(), in.size() );
wstring ws( buf, num_chars );
Here I have the string ws = 'WeiÃ' If I expand ws to see there are 5 characters in the string and the 5th character is 159:L''.
wostringstream oss;
oss << L"insert into myutftable values("
<< id
<< L"', '"
<< ws
<< L") ";
Then I am using SQLExecDirect to update the database. This is the place I am seeing the crash.
I am trying to understand why it crashes. I have trying few things unsuccessfully. I used character set = utf8 but no luck. Can anyone tell me what could be the reason for crash and how to fix?
BTW I am using Firebird database version 2.0.
UPDATE1:
1) If I do not convert UTF8 string to wstring and use SQLExecDirectA call it works fine. But at this point I do not know side effects because that database has been accessing from lot other places.
2) I have tried pushing the same string from command line using isql.exe, no issues!
I wonder why in my debugger watch list the last character was not displayed (Ÿ this character after converting to wchar shown as 159:L''. I looked in character set 159 refers to Ÿ. Any ideas why my debugger does not show this character as wchar, but it was displayed as just char?
Is there a way to Debug SQLExecDirect? I am using OdbcJdbc Drivers.
Update 2:
Seems like I am having issue with Ÿ character in my strings. As long as I am sending it in as string no issues if I use wstring at all it fails.
I just observed in memory how it was represented when I am sending it as 9f (= Ÿ) no issues with SQLExecDirectA if I send it as 9f 00 (= Ÿ wchar) then it crashes on SQLExecDirect
My application was built using Unicode Character Set. Firebird database character set set to none.
Any ideas??
mbstowcs assumes its second parameter to be a string in the system default code page, also known as CP_ACP, which is never UTF-8 (also known as CP_UTF8).
The inverse of WideCharToMultiByte is MultiByteToWideChar. Though it's unclear why you want to convert a string from Unicode to UTF-8, only to convert it right back.

Storing and retrieving UTF-8 strings from Windows resource (RC) files

I created an RC file which contains a string table, I would like to use some special
characters: ö ü ó ú ő ű á é. so I save the string with UTF-8 encoding.
But when I call in my cpp file, something like this:
LoadString("hu.dll", 12, nn, MAX_PATH);
I get a weird result:
How do I solve this problem?
As others have pointed out in the comments, the Windows APIs do not provide direct support for UTF-8 encoded text. You cannot pass the MessageBox function UTF-8 encoded strings and get the output that you expect. It will, instead, interpret them as characters in your local code page.
To get a UTF-8 string to pass to the Windows API functions (including MessageBox), you need to use the MultiByteToWideChar function to convert from UTF-8 to UTF-16 (what Windows calls Unicode, or wide strings). Passing the CP_UTF8 flag for the first parameter is the magic that enables this conversion. Example:
std::wstring ConvertUTF8ToUTF16String(const char* pszUtf8String)
{
// Determine the size required for the destination buffer.
const int length = MultiByteToWideChar(CP_UTF8,
0, // no flags required
pszUtf8String,
-1, // automatically determine length
nullptr,
0);
// Allocate a buffer of the appropriate length.
std::wstring utf16String(length, L'\0');
// Call the function again to do the conversion.
if (!MultiByteToWideChar(CP_UTF8,
0,
pszUtf8String,
-1,
&utf16String[0],
length))
{
// Uh-oh! Something went wrong.
// Handle the failure condition, perhaps by throwing an exception.
// Call the GetLastError() function for additional error information.
throw std::runtime_error("The MultiByteToWideChar function failed");
}
// Return the converted UTF-16 string.
return utf16String;
}
Then, once you have a wide string, you will explicitly call the wide-string variant of the MessageBox function, MessageBoxW.
However, if you only need to support Windows and not other platforms that use UTF-8 everywhere, you will probably have a much easier time sticking exclusively with UTF-16 encoded strings. This is the native Unicode encoding that Windows uses, and you can pass these types of strings directly to any of the Windows API functions. See my answer here to learn more about the interaction between Windows API functions and strings. I recommend the same thing to you as I did to the other guy:
Stick with wchar_t and std::wstring for your characters and strings, respectively.
Always call the W variants of Windows API functions, including LoadStringW and MessageBoxW.
Ensure that the UNICODE and _UNICODE macros are defined either before you include any of the Windows headers or in your project's build settings.