Wrong encoding when getting a string from MySQL database with C++ - c++

I'm writing an MFC app with C++ in Visual Studio 2012. App connects to a MySQL database and shows every row to a List Box.
Words are in Russian, database encoding is cp1251. I've set the same character set using this code:
if (!mysql_set_character_set(mysql, "cp1251")) {
statusBox.SetWindowText((CString)"CP1251 is set for MYSQL.");
}
But it doesn't help at all.
I display data using this code:
while ((row = mysql_fetch_row(result)) != NULL) {
CString string = (CString)row[1];
listBox.AddString(string);
}
This code also doesn't help:
mysql_query(mysql, "set names cp1251");
Please help. What should I do to display cyrillic correctly?

When crossing system boundaries that use different character encodings you have to convert between them. In this case, the MySQL database uses CP 1251 while Windows (and CString) use UTF-16. The conversion might look like this:
#if !defined(_UNICODE)
#error Unicode configuration required
#endif
CString CPtoUnicode( const char* CPString, UINT CodePage ) {
CString retValue;
// Retrieve required string length
int len = MultiByteToWideChar( CodePage, 0,
CPString, -1,
NULL, 0 );
if ( len == 0 ) {
// Error -> return empty string
return retValue;
}
// Allocate CString's internal buffer
LPWSTR buffer = retValue.GetBuffer( len );
// Do the conversion
MultiByteToWideChar( CodePage, 0,
CPString, -1,
buffer, len );
// Return control of the buffer back to the CString object
retValue.ReleaseBuffer();
return retValue;
}
This should be used as follows:
while ( ( row = mysql_fetch_row( result ) ) != NULL ) {
CString string = CPtoUnicode( row[1], 1251 );
listBox.AddString( string );
}
Alternatively, you could use CStrings built-in conversion support, which requires to set the thread's locale to the source encoding (CP 1251) and use the conversion constructor.

Related

MFC C++ Derive from CEdit and derive GetWindowText

I am deriving from CEdit, to make a custom control. It would be nice, if like the MFC Feature Pack controls (Mask, Browsable) that I could change GetWindowText to actually report back not what is normally displayed on the control (for example, convert the data between hex and decimal, then return back that string).
Is it this possible in a derived CEdit?
Add message map entries for WM_GETTEXT and WM_GETTEXTLENGTH to your derived CEdit class:
BEGIN_MESSAGE_MAP( CMyEdit, CEdit )
ON_WM_GETTEXT()
ON_WM_GETTEXTLENGTH()
END_MESSAGE_MAP()
As we are overriding these messages we need a method of getting the original text of the edit control without going into endless recursion. For this we can directly call the default window procedure which is named DefWindowProc:
CStringW CMyEdit::GetTextInternal()
{
CStringW text;
LRESULT len = DefWindowProcW( WM_GETTEXTLENGTH, 0, 0 );
if( len > 0 )
{
// WPARAM = len + 1 because the length must include the null terminator.
len = DefWindowProcW( WM_GETTEXT, len + 1, reinterpret_cast<LPARAM>( text.GetBuffer( len ) ) );
text.ReleaseBuffer( len );
}
return text;
}
The following method gets the original window text and transforms it. Anything would be possible here, including the example of converting between hex and dec. For simplicity I just enclose the text in dashes.
CStringW CMyEdit::GetTransformedText()
{
CStringW text = GetTextInternal();
return L"--" + text + L"--";
}
Now comes the actual handler for WM_GETTEXT which copies the transformed text to the output buffer.
int CMyEdit::OnGetText( int cchDest, LPWSTR pDest )
{
// Sanity checks
if( cchDest <= 0 || ! pDest )
return 0;
CStringW text = GetTransformedText();
// Using StringCchCopyExW() to make sure that we don't write outside of the bounds of the pDest buffer.
// cchDest defines the maximum number of characters to be copied, including the terminating null character.
LPWSTR pDestEnd = nullptr;
HRESULT hr = StringCchCopyExW( pDest, cchDest, text.GetString(), &pDestEnd, nullptr, 0 );
// If our text is greater in length than cchDest - 1, the function will truncate the text and
// return STRSAFE_E_INSUFFICIENT_BUFFER.
if( SUCCEEDED( hr ) || hr == STRSAFE_E_INSUFFICIENT_BUFFER )
{
// The return value is the number of characters copied, not including the terminating null character.
return pDestEnd - pDest;
}
return 0;
}
The handler for WM_GETTEXTLENGTH is self-explanatory:
UINT CMyEdit::OnGetTextLength()
{
return GetTransformedText().GetLength();
}
Thanks to everyone for pointing me in the right direction. I tried OnGetText, but the problem seemed to be I couldn't get the underlying string or it would crash when calling GetWindowText (or just called OnGetText again...and couldn't find the underlying string).
After seeing what they did on masked control, I did a simpler answer like this. Are there any drawbacks? It seemed to not cause any issues or side effects...
Derive directly from GetWindowText
void CConvertibleEdit::GetWindowText(CString& strString) const
{
CEdit::GetWindowText(strString);
ConvertibleDataType targetDataType;
if (currentDataType == inputType)
{
}
else
{
strString = ConvertEditType(strString, currentDataType, inputType);
}
}

How to get the current Tab Item name from CTabCtrl in MFC?

I am trying to get the text of the currently chosen tab in CTabCtrl.
int tabCurSel = currentTabCtrl->GetCurSel();
TCITEM tcItem;
tcItem.mask = TCIF_TEXT;
tcItem.cchTextMax = 256; //Do I need this?
CString tabCurrentCString;
currentTabCtrl->GetItem(tabCurSel, &tcItem);
tabCurrentCString = tcItem.pszText;
CT2A tabCurrentChar(tabCurrentCString);
std::string tabCurrentStr(tabCurrentChar);
return tabCurrentStr;
I clearly have some unnecessary string conversions and currently this returns a "Error reading characters of the string" in
tcItem.pszText;
How can I get the string from the CTabCtrl? I ultimately am trying to get an std::string but the main question is how to get the text from the tab.
tcItem.pszText is pointing to 0. To fill it with text, it has to point to a buffer before a call is made to GetItem:
Documentation for: CTabCtrl::GetItem
pszText
Pointer to a null-terminated string containing the tab text if the
structure contains information about a tab. If the structure is
receiving information, this member specifies the address of the buffer
that receives the tab text.
Example:
TCITEM tcItem { 0 };
tcItem.mask = TCIF_TEXT;
const int len = 256;
tcItem.cchTextMax = len;
TCHAR buf[len] = { 0 };
tcItem.pszText = buf;
currentTabCtrl->GetItem(tabCurSel, &tcItem);
Both tcItem.pszText and buf will point to the same text. Or use CString with CString::GetBuffer()/CString::ReleaseBuffer()
CString tabCurrentCString;
TCITEM tcItem;
tcItem.mask = TCIF_TEXT;
tcItem.cchTextMax = 256;
tcItem.pszText = tabCurrentCString.GetBuffer(tcItem.cchTextMax);
BOOL result = currentTabCtrl->GetItem(tabCurSel, &tcItem);
tabCurrentCString.ReleaseBuffer();
if (result)
MessageBox(tabCurrentCString); //success
It looks like you are using the recommended Unicode settings. Avoid converting UNICODE to ANSI (std::string). This conversion will work for Latin languages, most of the time, but it's not needed. You can use std::wstring if you need to use that in STL, or convert to UTF-8 if you want to send data to internet etc.
std::string str = CW2A(tabCurrentCString, CP_UTF8);

Convert variant from recordset (with currency value) into properly formatted string

I have a query that gets the value of the payed amount of money. This field is currency type, in MS Access database.
I need to display this value in textbox ( I am using C++ and raw WinAPI for GUI ), so I need to know how to convert _variant_ from recordset into proper string (1,200.55).
Here is an example ( remember, I use raw WinAPI and C++ for GUI ):
SetDlgItemText(hDlg, IDC_EDIT11,
pRS->Fields->GetItem(L"PaidValue")->Value.bstrVal); // problem is this line
My textbox is empty when I run the program.
When I debug it, it reports no errors.
QUESTION:
How can I convert _variant_t into string ( 1,200.00)?
CString GetString(_variant_t vValue)
{
USES_CONVERSION;
CString strValue("");
if ((V_VT(&vValue) == VT_NULL) || (V_VT(&vValue) == VT_EMPTY))
return strValue;
if ((V_VT(&vValue) == VT_BSTR) || SUCCEEDED(VariantChangeType(&vValue, &vValue, 0, VT_BSTR)))
strValue = OLE2T(V_BSTR(&vValue));
return strValue;
}
CString strVal = GetString(pRS->Fields->Item["PaidValue"]->Value)

How to set HTML Unicode text to clipboard in VC++?

I am a newbie to C++. I want to get the content of the clipboard, which might contain Unicode chars, append a div tag with some content formatted in HTML and set that back to clipboard.
I have achieved successfully in getting the content and appending it. But could not set it back to the clipboard as an HTML text. I have achieved setting as simple text. Here is my code:
#include <shlwapi.h>
#include <iostream>
#include <conio.h>
#include <stdio.h>
using namespace std;
wstring getClipboard(){
if (OpenClipboard(NULL)){
HANDLE clip = GetClipboardData(CF_UNICODETEXT);
WCHAR * c;
c = (WCHAR *)clip;
CloseClipboard();
return (WCHAR *)clip;
}
return L"";
}
bool setClipboard(wstring textToclipboard)
{
if (OpenClipboard(NULL)){
EmptyClipboard();
HGLOBAL hClipboardData;
size_t size = (textToclipboard.length()+1) * sizeof(WCHAR);
hClipboardData = GlobalAlloc(NULL, size);
WCHAR* pchData = (WCHAR*)GlobalLock(hClipboardData);
memcpy(pchData, textToclipboard.c_str(), size);
SetClipboardData(CF_UNICODETEXT, hClipboardData);
GlobalUnlock(hClipboardData);
CloseClipboard();
return true;
}
return false;
}
int main (int argc, char * argv[])
{
wstring s = getClipboard();
s += std::wstring(L"some extra text <b>hello</b>");
setClipboard(s);
getch();
return 0;
}
I did try using the code described here and read the doc here. But I couldn't make it work. What I tried could be way off track or completely wrong.
Update: The code below is what I tried after the modifications suggested by Cody Gray to the original code presented here:
bool CopyHTML2(WCHAR *html ){
wchar_t *buf = new wchar_t [400 + wcslen(html)];
if(!buf) return false;
static int cfid = 0;
if(!cfid) cfid = RegisterClipboardFormat("HTML Format");
// Create a template string for the HTML header...
wcscpy(buf,
L"Version:0.9\r\n"
L"StartHTML:00000000\r\n"
L"EndHTML:00000000\r\n"
L"StartFragment:00000000\r\n"
L"EndFragment:00000000\r\n"
L"<html><body>\r\n"
L"<!--StartFragment -->\r\n");
// Append the HTML...
wcscat(buf, html);
wcscat(buf, L"\r\n");
// Finish up the HTML format...
wcscat(buf,
L"<!--EndFragment-->\r\n"
L"</body>\r\n"
L"</html>");
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
ptr = wcsstr(buf, L"EndHTML");
wsprintfW(ptr+8, L"%08u", wcslen(buf));
*(ptr+8+8) = '\r';
ptr = wcsstr(buf, L"StartFragment");
wsprintfW(ptr+14, L"%08u", wcsstr(buf, L"<!--StartFrag") - buf);
*(ptr+14+8) = '\r';
ptr = wcsstr(buf, L"EndFragment");
wsprintfW(ptr+12, L"%08u", wcsstr(buf, L"<!--EndFrag") - buf);
*(ptr+12+8) = '\r';
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, wcslen(buf)+4);
wchar_t *ptr = (wchar_t *)GlobalLock(hText);
wcscpy(ptr, buf);
GlobalUnlock(hText);
SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
// Clean up...
delete [] buf;
return true;
}
This code compiles successfully, But I get the following error at SetClipboardData : HEAP[Project1.exe]: Heap block at 007A8530 modified at 007A860A past requested size of d2
Project1.exe has triggered a breakpoint.
Please guide me on how to proceed. I am using Visual Studio Express 2012 on Windows 8. Thanks.
You're mismatching ANSI (narrow) and Unicode (wide) strings.
Unlike the wcscpy function, the w in the wsprintf function doesn't stand for "wide", it stands for "Windows". It is part of the Win32 API, rather than the C runtime library. All of the Win32 API functions that work with strings have two versions, one suffixed with an A that deals with ANSI strings and another suffixed with a W that deals with wide strings. The headers hide all of this from you with macros. I explain all of this in more detail here—recommended reading.
Anyway, the simple fix here is to explicitly call the wide variant of that function, since you're correctly using wide strings everywhere else. Make all the calls to wsprintf look like this:
wchar_t *ptr = wcsstr(buf, L"StartHTML");
wsprintfW(ptr+10, L"%08u", wcsstr(buf, L"<html>") - buf);
*(ptr+10+8) = L'\r';
Alternatively, you could use the swprintf function provided by the C runtime library instead of the Win32 version. This one works just like the wcsstr and wcscpy functions you're using elsewhere. The w in the name means "wide". The documentation for this series of functions is here.
Note also that when you use character or string literals, they also need to be wide characters. You accomplish that by prepending them with an L. You do that some places, but miss doing it others. Make sure that you do it consistently.
The compiler should warn you about all this, though. You just need to make sure you turn your warning level up and don't ignore any of the warnings. Also make sure that both the UNICODE and _UNICODE preprocessor symbols are defined globally for your project. That will ensure that you are always calling the Unicode/wide versions of functions. Although that should be the default for all new projects.
This is the function I came up with the help of Jochen Arndt at codeproject.com. Hope this helps somebody. Here is a complete working code, if you are interested in checking this out.
It still has one problem. That is when pasted to onenote alone, it pastes gibberish after a anchor tag. It does not happen with Word, PowerPoint or Excel. And it does not have this problem for normal English language texts. If you have a solution for this, please do let me know. The problem seems to be with OneNote. Not with the code.
bool setClipboard(LPCWSTR lpszWide){
int nUtf8Size = ::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, NULL, 0, NULL, NULL);
if (nUtf8Size < 1) return false;
const int nDescLen = 105;
HGLOBAL hGlobal = ::GlobalAlloc(GMEM_MOVEABLE, nDescLen + nUtf8Size);
if (NULL != hGlobal)
{
bool bErr = false;
LPSTR lpszBuf = static_cast<LPSTR>(::GlobalLock(hGlobal));
LPSTR lpszUtf8 = lpszBuf + nDescLen;
if (::WideCharToMultiByte(CP_UTF8, 0, lpszWide, -1, lpszUtf8, nUtf8Size, NULL, NULL) <= 0)
{
bErr = true;
}
else
{
LPCSTR lpszStartFrag = strstr(lpszUtf8, "<!--StartFragment-->");
LPCSTR lpszEndFrag = strstr(lpszUtf8, "<!--EndFragment-->");
lpszStartFrag += strlen("<!--StartFragment-->") + 2;
int i = _snprintf(
lpszBuf, nDescLen,
"Version:1.0\r\nStartHTML:%010d\r\nEndHTML:%010d\r\nStartFragment:%010d\r\nEndFragment:%010d\r\n",
nDescLen,
nDescLen + nUtf8Size - 1, // offset to next char behind string
nDescLen + static_cast<int>(lpszStartFrag - lpszUtf8),
nDescLen + static_cast<int>(lpszEndFrag - lpszUtf8));
}
::GlobalUnlock(hGlobal);
if (bErr)
{
::GlobalFree(hGlobal);
hGlobal = NULL;
}
// Get clipboard id for HTML format...
static int cfid = 0;
cfid = RegisterClipboardFormat("HTML Format");
// Open the clipboard...
if(OpenClipboard(0)) {
EmptyClipboard();
HGLOBAL hText = GlobalAlloc(GMEM_MOVEABLE |GMEM_DDESHARE, strlen(lpszBuf)+4);
char *ptr = (char *)GlobalLock(hText);
strcpy(ptr, lpszBuf);
GlobalUnlock(hText);
::SetClipboardData(cfid, hText);
CloseClipboard();
GlobalFree(hText);
}
}
return NULL != hGlobal;
}
Your problem comes from the use of wchar_t instead of char in the cited example which makes you wrong on the offset computations.
I would however recommend you avoiding the use of wchar_t for transfering UNICODE text to the clipboard. Indeed, UTF-8 char could coded with a sequence of bytes comprised between 1 and 4 bytes, while wchar_t on Windows is a fixed 2 bytes type.
As explained in the Microsoft doc refered in your email, the content of the clipboard shall be UNICODE, which happens to be the same as ASCII for the characters contained in the header of the clipboard memory.
To transfert UNICODE in the clipboard, you can do it using the standard char C++ functions to prepare the content sent to clipboard (std::string for eg.)
While the cited example works, please find here another code sample using C++ framework that can actually copy UTF-8 chars to the clipboard in HTML format:
void copyHTMLtoClipboard(const std::string& html) {
std::string contextStart("Version:0.9\r\nStartHTML:0000000000\r\nEndHTML:0000000000\r\nStartFragment:0000000000\r\nEndFragment:0000000000\r\n<html><body>\r\n<!--StartFragment -->\r\n");
std::string contextEnd("\r\n<!--EndFragment -->\r\n</body></html>");
std::stringstream aux;
aux << contextStart << html << contextEnd;
std::string res = aux.str();
size_t htmlStart = 105 * sizeof(char);
size_t fragmentStart = 119 * sizeof(char);
size_t htmlEnd = res.size() * sizeof(char);
size_t fragmentEnd = htmlEnd - 35 * sizeof(char);
aux.fill('0');
aux.width(10);
aux.seekp(23);
aux << htmlStart;
aux.seekp(43);
aux.fill('0');
aux.width(10);
aux << htmlEnd;
aux.seekp(69);
aux.fill('0');
aux.width(10);
aux << fragmentStart;
aux.seekp(93);
aux.fill('0');
aux.width(10);
aux << fragmentEnd;
res = aux.str();
HGLOBAL hdst = GlobalAlloc(GMEM_MOVEABLE | GMEM_DDESHARE, htmlEnd + sizeof(char));
LPSTR dst = (LPSTR)GlobalLock(hdst);
memcpy(dst, res.c_str(), htmlEnd);
dst[htmlEnd] = 0;
GlobalUnlock(hdst);
OpenClipboard(NULL);
EmptyClipboard();
SetClipboardData(RegisterClipboardFormat(L"HTML Format"), hdst);
CloseClipboard();
GlobalFree(hdst);
}
Note that this code was compiled defining the macros _UNICODE and UNICODE.

C++/CLI UTF-8 & JNI Not Converting Unicode String Properly

I have a Java class that returns a unicode string... Java has the correct version of the string but when it comes through a JNI wrapper in the form of a jstring it must be converted over to a C++ or C++/CLI string. Here is some test code I have which actually works on most languages except for the asian char sets. Chinese Simplified & Japanese characters are garbled and I can't figure out why. Here is the code snippet, I don't see anything wrong with either methods of conversion (the if statement checks os as I have two VMs with diff OS's and runs the appropriate conversion method).
String^ JStringToCliString(const jstring string){
String^ converted = gcnew String("");
JNIEnv* envLoc = GetJniEnvHandle();
std::wstring value;
jboolean isCopy;
if(string){
try{
jsize len = env->GetStringLength(string);
if(Environment::OSVersion->Version->Major >= 6) // 6 is post XP/2003
{
TraceLog::Log("Using GetStringChars() for string conversion");
const jchar* raw = envLoc->GetStringChars(string, &isCopy);
// todo add exception handling here for jvm
if (raw != NULL) {
value.assign(raw, raw + len);
converted = gcnew String(value.c_str());
env->ReleaseStringChars(string, raw);
}
}else{
TraceLog::Log("Using GetStringUTFChars() for string conversion.");
const char* raw = envLoc->GetStringUTFChars(string, &isCopy);
if(raw) {
int bufSize = MultiByteToWideChar(CP_UTF8, 0 , raw , -1, NULL , 0 );
wchar_t* wstr = new wchar_t[bufSize];
MultiByteToWideChar( CP_UTF8 , 0 , raw , -1, wstr , bufSize );
String^ val = gcnew String(wstr);
delete[] wstr;
converted = val; // partially working
envLoc->ReleaseStringUTFChars(string, raw);
}
}
}catch(Exception^ ex){
TraceLog::Log(ex->Message);
}
}
return converted;
}
Answer was to enable east asian languages in Windows XP as Win7 + Later work fine. Super easy.... waste of a entire day lol.