Firebird crashes on `UTF8 string converted to wstring` - c++

Hi I am completely new to database. Here I am having problem inserting a row into a table. The string comes in is in Unicode and converted to UTF8 using WideCharToMultiByte call. Then constructed a database query as below.
in = "Weiß" (UTF8 string result of conversion from Unicode to UTF8)
wchar_t buf[2048]; //= new wchar_t[ in.size() ];
size_t num_chars = mbstowcs( buf, in.c_str(), in.size() );
wstring ws( buf, num_chars );
Here I have the string ws = 'WeiÃ' If I expand ws to see there are 5 characters in the string and the 5th character is 159:L''.
wostringstream oss;
oss << L"insert into myutftable values("
<< id
<< L"', '"
<< ws
<< L") ";
Then I am using SQLExecDirect to update the database. This is the place I am seeing the crash.
I am trying to understand why it crashes. I have trying few things unsuccessfully. I used character set = utf8 but no luck. Can anyone tell me what could be the reason for crash and how to fix?
BTW I am using Firebird database version 2.0.
UPDATE1:
1) If I do not convert UTF8 string to wstring and use SQLExecDirectA call it works fine. But at this point I do not know side effects because that database has been accessing from lot other places.
2) I have tried pushing the same string from command line using isql.exe, no issues!
I wonder why in my debugger watch list the last character was not displayed (Ÿ this character after converting to wchar shown as 159:L''. I looked in character set 159 refers to Ÿ. Any ideas why my debugger does not show this character as wchar, but it was displayed as just char?
Is there a way to Debug SQLExecDirect? I am using OdbcJdbc Drivers.
Update 2:
Seems like I am having issue with Ÿ character in my strings. As long as I am sending it in as string no issues if I use wstring at all it fails.
I just observed in memory how it was represented when I am sending it as 9f (= Ÿ) no issues with SQLExecDirectA if I send it as 9f 00 (= Ÿ wchar) then it crashes on SQLExecDirect
My application was built using Unicode Character Set. Firebird database character set set to none.
Any ideas??

mbstowcs assumes its second parameter to be a string in the system default code page, also known as CP_ACP, which is never UTF-8 (also known as CP_UTF8).
The inverse of WideCharToMultiByte is MultiByteToWideChar. Though it's unclear why you want to convert a string from Unicode to UTF-8, only to convert it right back.

Related

QTextBrowser not displaying non-english characters

I'm developing a Qt GUI application to parse out a custom windows binary file that stores unicode text using wchar_t (default UTF-16 encoding). I've constructed a QString using QString::fromWcharArray and passed it to QTextBrowser::insertPlainText like this
wchar_t *p = ; // pointer to a wchar_t string in the binary file
QString t = QString::fromWCharArray(p);
ui.logBrowser->insertPlainText(t);
The displayed text displays ASCII characters correctly, but non-ASCII characters are displayed as a rectangular box instead. I've followed the code in a debugger and p points to a valid wchar_t string and the constructed QString t is also a valid string matching the wchar_t string. The problem happens when printing it out on a QTextBrowser.
How do I fix this?
First of all read documentation. So depending on system you will have different encoding UCS-4 or UTF-16! What is the size of wchar_t?
Secondly there is alternative API: try QString::fromUtf16.
Finally what kind of character are you using? Hebrew/Cyrillic/Japanese/???. Are you sure those characters are supported by font you are using?

Convert wide CString to char*

There are lots of times this question has been asked and as many answers - none of which work for me and, it seems, many others. The question is about wide CStrings and 8bit chars under MFC. We all want an answer that will work in ALL cases, not a specific instance.
void Dosomething(CString csFileName)
{
char cLocFileNamestr[1024];
char cIntFileNamestr[1024];
// Convert from whatever version of CString is supplied
// to an 8 bit char string
cIntFileNamestr = ConvertCStochar(csFileName);
sprintf_s(cLocFileNamestr, "%s_%s", cIntFileNamestr, "pling.txt" );
m_KFile = fopen(LocFileNamestr, "wt");
}
This is an addition to existing code (by somebody else) for debugging.
I don't want to change the function signature, it is used in many places.
I cannot change the signature of sprintf_s, it is a library function.
You are leaving out a lot of details, or ignoring them. If you are building with UNICODE defined (which it seems you are), then the easiest way to convert to MBCS is like this:
CStringA strAIntFileNameStr = csFileName.GetString(); // uses default code page
CStringA is the 8-bit/MBCS version of CString.
However, it will fill with some garbage characters if the unicode string you are translating from contains characters that are not in the default code page.
Instead of using fopen(), you could use _wfopen() which will open a file with a unicode filename. To create your file name, you would use swprintf_s().
an answer that will work in ALL cases, not a specific instance...
There is no such thing.
It's easy to convert "ABCD..." from wchar_t* to char*, but it doesn't work that way with non-Latin languages.
Stick to CString and wchar_t when your project is unicode.
If you need to upload data to webpage or something, then use CW2A and CA2W for utf-8 and utf-16 conversion.
CStringW unicode = L"Россия";
MessageBoxW(0,unicode,L"Russian",0);//should be okay
CStringA utf8 = CW2A(unicode, CP_UTF8);
::MessageBoxA(0,utf8,"format error",0);//WinApi doesn't get UTF-8
char buf[1024];
strcpy(buf, utf8);
::MessageBoxA(0,buf,"format error",0);//same problem
//send this buf to webpage or other utf-8 systems
//this should be compatible with notepad etc.
//text will appear correctly
ofstream f(L"c:\\stuff\\okay.txt");
f.write(buf, strlen(buf));
//convert utf8 back to utf16
unicode = CA2W(buf, CP_UTF8);
::MessageBoxW(0,unicode,L"okay",0);

std::wstring to QString conversion with Hiragana

I'm trying to convert text containing Hiragana from a wstring to a QString, so that it can be used on a label's text property. However my code is not working and I'm not sure why that is.
The following conversion method obviously tells me that I made something wrong:
std::wstring myWString = L"Some Hiragana: あ い う え お";
ui->label->setText(QString::fromStdWString(myWString));
Output: Some Hiragana: ゠ㄠㆠ㈠ãŠ
I can print Hiragana on a label if I put them in a string directly:
ui->label->setText("Some Hiragana: あ い う え お");
Output: Some Hiragana: あ い う え お
That means I can avoid this problem by simply using std::string instead of std::wstring, but I'd like to know why this is happening.
VS is interpreting the file as Windows-1252 instead of UTF-8.
As an example, 'あ' in UTF-8 is E3 81 82, but the compiler is reading each byte as a single Windows-1252 char before converting it to the respective UTF-16 codepoints E3 201A, which works out as 'ã‚' (81 is either ignored by VS as it is reserved in Windows-1252, or not printed by qt if VS happens to convert it to the respective C1 control character).
The direct version works because the compiler doesn't perform any conversions and leaves the string as E3 81 82.
To fix your issue you will need to inform VS that the file is UTF-8, according to other posts one way is to ensure the file has a UTF-8 BOM.
The only portable way of fixing this is to use escape sequences instead:
L"Some Hiragana: \u3042 \u3044 \u3046 \u3048 \u304A"

Why does my colon character disappear when I go from char[] to string?

In an old Windows application I'm working on I need to get a path from an environment variable and then append onto it to build a path to a file. So the code looks something like this:
static std::string PathRoot; // Private variable stored in class' header file
char EnvVarValue[1024];
if (! GetEnvironmentVariable(L"ENV_ROOT", (LPWSTR) EnvVarValue, 1024))
{
cout << "Could not retrieve the ROOT env variable" << endl;
return;
}
else
{
PathRoot = EnvVarValue;
}
// Added just for testing purposes - Returning -1
int foundAt = PathRoot.find_first_of(':');
std::string FullFilePath = PathRoot;
FullFilePath.append("\\data\\Config.xml");
The environment value for ENV_ROOT is set to "c:\RootDir" in the Windows System Control Panel. But when I run the program I keep ending up with a string in FullFilePath that is missing the colon char and anything that followed in the root folder. It looks like this: "c\data\Config.xml".
Using the Visual Studio debugger I looked at EnvVarValue after passing the GetEnvironmentVariable line and it shows me an array that seems to have all the characters I'd expect, including the colon. But after it gets assigned to PathRoot, mousing over PathRoot only shows the C and drilling down it says something about a bad ptr. As I noted the find_first_of() call doesn't find the colon char. And when the append is done it only keeps the initial C and drops the rest of the RootDir value.
So there seems to be something about the colon character that is confusing the string constructor. Yes, there are a number of ways I could work around this by leaving the colon out of the env variable and adding it later in the code. But I'd prefer to find a way to have it read and used properly from the environment variable as it is.
You cannot simply cast a char* to a wchar_t* (by casting to LPWSTR) and expect things to work. The two are fundamentally distinct types, and in Windows, they signify different encoding.
You obviously have WinAPI defines set such that GetEnvironmentVariable resolves to GetEnvironmentVariableW, which uses UTF-16 to encode the string. In practice, this means a 0 byte follows every ASCII character in memory.
You then construct a std::string out of this, so it takes the first 0 byte (at char index 1) as the string terminator, so you get just "c".
You have several options:
Use std::wstring and wchar_t EnvVarValue[1024];
Call GetEnvironmentVariableA() (which uses char and ASCII)
Use wchar_t EnvVarValue[1024]; and convert the returned value to a std::string using something like wcstombs.
It seems you are building with wide-character functions (as indicated by your cast to LPWSTR). This means that the string in EnvVarValue is a wide-character string, and you you should be using wchar_t and std::wstring instead.
I would guess that the contents in the array array after the GetEnvironmentVariable call is actually the ASCII values 0x43 0x00 0x3a 0x00 0x5c 0x00 etc. (that is the wide-char representation of "C:\"). The first zero acts as the string terminator for a narrow-character string, which is why the narrow-character string PathRoot only contains the 'C'.
The problem might be that EnvVarValue is not a wchar. Try using wchar_t and std::wstrîng.

String encoding VB6 / C++ dll

I am having a problem with some characters in 2 strings that my program uses.
String #1 is filled using VB code that gets data from a 3rd party application.
String #2 gets similar data from the same 3rd party application, but it gets it with a C++ dll and sends it to VB.
The data has some weird symbols in it.
I don't know a whole lot about encoding and different character sets, but I'll try to explain it the best I can.
I will use "Т" as my example character.
"Т" (note this isnt a normal capital t) it is unicode decimal value 1058
http://www.unicodemap.org/details/0x0422/index.html
When this character appears in String #1 during runtime it appears as "?", which I believe is just what VB6 does to show some unicode characters. When I use AscW on the character it returns the correct value of 1058.
When I output the string to a text file, it appears as "?".
The same character in String #2 from the C++ DLL appears as 2 characters "Т"
When I output that string to a text file, the character appears properly as "Т".
I was only outputting things to text files for testing purposes. I only need the 2 strings to be encoded / appear the same during run time.
Any idea whats going on here? Any way for me to get weird characters to appear the same in both strings?
Thanks
edit: also the C++ dll is in multi character set and sends the data in a BSTR string
CODE IN C++ DLL
allChat is a CString
BSTR Message;
int len = allChat.GetLength();
Message = SysAllocStringByteLen ((LPCTSTR)allChat,len+1);
Message is returned to the VB app.. and nothing happens to the string after that.
String #1 is just a regular VB string
From the way Cyrillic "T" becomes "Т", you get your string as a UTF8 encoded string (I verified that with Notepad++ by switching encodings). You need to convert it to Unicode before sending it to your VB app. Note that your VB app needs to be Unicode, not ASCII.
You can convert UTF8 to std::wstring with this function:
std::wstring utf8to16( const char* src )
{
vector<wchar_t> buffer;
buffer.resize(MultiByteToWideChar(CP_UTF8, 0, src, -1, 0, 0));
MultiByteToWideChar(CP_UTF8, 0, src, -1, &buffer[0], buffer.size());
return &buffer[0];
}