I read a german text from an sqlite database with C++, (the text looks good with the database viewer). But when I display it in a dialog with SetDlgItemText the text looks like this (see the picture).
CString strWarning(pStmt->GetColumnCString(nCol));
SetDlgItemText(IDC_WARNING_MESSAGE, strWarning);
Your string looks like it's encoded as UTF-8, which Windows doesn't handle.
You'll need to convert it to UTF-16 and ensure that you're calling the wide version of SetDlgItemText, either by changing your project's character set option to Use Unicode Character Set or specifying SetDlgItemTextW.
You can convert your string from UTF-8 to UTF-16 with the MultiByteToWideChar function.
Related
I'm getting stuck trying to convert an input string in char* to Chinese character encoding. An application accepts a Chinese string input ex: "啊说到" and when it is written into a file it turns into this "°¡Ëµµ½". I'm able to take this input and feed it to _mbstowcs_s_l() but the solution needs to be locale independent, so I'm forced to use either mbstowcs() or WideCharToMultiByte() but it looks like both would work for me if the input did already went through MBCS to UTF-8, which in our case isnt.
The project is using Multibyte Character Set, and I'm struggling to understand what is going on. One other thing is the input is coming from a different application and stores it into file.
The application that accepted the Chinese input is an MFC set to Multibyte Char Set and the os was set to regional Chinese Simplified, UI accepts the input and is placed on a CString, that is coped to a char*. This is that part where I don't know whats going on in the encoding, this application stores it into a file, then we read it using the other application, the string is read unto char*, thats when the characters seems to take the "°¡Ëµµ½".
Question is, how can I turn this encoded char"°¡Ëµµ½" back to its Chinese encoding "啊说到", with out setting the locale in _mbstowcs_s_l()? The problem is, we could be reading strings from other regional settings and the application wouldn't just know what character map to use unless we tell it to.
I have to write German text on a pdf created by Libharu. I assign German Text to a string variable (i.e. std::string TestString = "VariableGesamtlänge";) and then put that text to a pdf. My simple code is following:
//-----UTF8 Encoding
HPDF_UseUTFEncodings(pdf);
HPDF_SetCurrentEncoder(pdf, "UTF-8");
const char *fontname = HPDF_LoadTTFontFromFile(pdf, "FreeSans.ttf", HPDF_TRUE);
HPDF_Font font = HPDF_GetFont(pdf, fontname, "UTF-8");
HPDF_Page_SetFontAndSize(page, font, 24);
std::string TestString = "VariableGesamtlänge";
DrawText(page, font, TestString.c_str(), y);
Problem: I get two square boxes instead of ä. I am using VS2010.
'ä' is not an ASCII character. It may be stored as a single character (in which case, which one?), or it may be stored as multiple characters (in which case, which ones?).
You have told the HPDF functions that you are going to pass text around as UTF-8 (which is an entirely sensible choice). This means 'ä' is represented by 0xC3 0xA4.
The source file is almost certainly encoded in 8-bit text, using (probably) code-page 1252. So 'ä' will be the single character 0xE4. You either need to tell the compiler to store strings as UTF-8, or it may be possible to re-encode the source files in UTF-8.
Your final option is to store the text in a (UTF-8) file, and read it from there.
I face one little problem. I am from country that uses extended character set in language (specifically Latin Extended-A due to characters like š,č,ť,ý,á,...).
I have ini file containing these characters and I would like to read them into program. Unfortunatelly, it is not working with getPrivateProfileStringW or ...A.
Here is part of source code. I hope it will help someone to find solution, because I am getting a little desperate. :-)
SOURCE CODE:
wchar_t pcMyExtendedString[200];
GetPrivateProfileStringA(
"CATEGORY_NAME",
"SECTION_NAME",
"error",
pcMyExtendedString,
200,
PATH_TO_INI_FILE
);
INI FILE:
[CATEGORY_NAME]
SECTION_NAME= ľščťžýáíé
Characters ý,á,í,é are readed correctly - they are from character set Latin-1 Supplement. Their hexa values are correct (0xFD, 0xE1, 0xED,...).
Characters ľ,š,č,ť,ž are readed incorrectly - they are from character set Latin Extended-A Their hexa values are incorrect (0xBE, 0x9A, 0xE8,...). Expected are values like 0x013E, 0x0161, 0x010D, ...
How could be this done? Is it possible or should I avoid these characters at all?
GetPrivateProfileString doesn't do any character conversion. If the call succeed, it will gives you exactly what is in the file.
Since you want to have unicode characters, your file is probably in UTF-8 or UTF-16. If your file is UTF-8, you should be able to read it with GetPrivateProfileStringA, but it will give you a char array that will contain the correct UTF-8 characters (that is, not 0x013E, because 0x013E is not UTF-8).
If your file is UTF-16, then GetPrivateProfileStringW should work, and give you the UTF-16 codes (0x013E, 0x0161, 0x010D, ...) in a wchar_t array.
Edit: Actually your file is encoded in Windows-1250. This is a single byte encoding, so GetPrivateProfileStringA works fine, and you can convert it to UTF-16 if you want by using MultiByteToWideChar with 1250 as code page parameter.
Try saving the file in UTF-8 - CodePage 65001 encoding, most likely your file would be in Western European (Windows) - CodePage 1252.
I have a website which allows users to input usernames.
The problem here is that the code in c++ assumes the browser encoding is Western Europe and converts the string received from the username text box into unicode to compare with string stored within the databasse.
with the right browser encoding set the character úser is recieved as %FAser and coverted properly to úser within the program
however with the browser settings set to UTF-8 the string is recieved as %C3%BAser and then converted to úser due to the code converting C3 and BA as seperate characters.
Is there a way to convert the example %c3%BA to ú while ensuring the right conversions are being made?
You can use the ICU library to convert between almost all usable encodings. This library also provides lots of string manipulation facilities.
In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.
In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.
First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!
std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.
Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.