I'm trying to insert some UTF-8 strings into PostgreSQL database. I'm using Visual C++ and MFC (this bit probably not important) and a project setting "Use Multi-Byte Character Set" (trying to switch database in an old legacy app). So when I execute some INSERT command with some text in Cyrillic "АБВГ" I expect to see this text in database, but I'm seeing this instead (in DBeaver): "ÐБВГ". I insert this text by converting the string "\xC0\xC1\xC2\xC3" from code page 1251 to CP_UTF8.
When I change the system setting "Language for non-Unicode programs" from English to some Cyrillic, like Russian, the text that is actually inserted is no longer "ÐБВГ", but "АБВГ". Postgres ODBC driver apparently uses CP_ACP to interpret my multi-byte strings. Indeed, if I now try to insert "\xC0\xC1\xC2\xC3" directly (without conversion to UTF-8), I do see "АБВГ" in database. But I need to insert UTF-8 strings, not a subset from a code page.
How do I instruct the Postgres ODBC driver to interpret my strings as UTF-8, and ignore the "Language for non-Unicode programs" system setting?
In PSQL console both server_encoding and client_encoding are set to UTF8.
Change your ODBC DSN connection string to include this: ConnSettings=SET CLIENT_ENCODING TO 'UTF8';
Related
While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes.We can see original characters in the database.In session level, we are using UTF-8 encoding.We have a multi language application and need to process Chinese, Russian, Greek,Polish,Japanese etc. characters.Please suggest.
try to change your page encoding. I also faced this kind of issue. We are using ANSII encoding, hence we created separate integration service with different encoding and file ran successfully.
There is an easy option. In session properties, select target flat file then click set file propeties. In that you can change the code-page. There you can choose UTF-8. By default it is in ANSII, that's why you are facing this issue.
I am finishing application in Visual C++/Windows API and I am using MySql C Connector.
Whole application code uses ANSI, MySql C Connector is in ANSI too.
This program will be used on Polish and German computers with Windows XP/Vista/7 or 8.
I want to correcly display german umlauts and polish accent characters on:
DialogBox controls (strings are loaded from language files)
Generated XHTML documents
Strings retrieved from MySql database displayed on controls and in XHTML documents
I have heard about MultiByteToWideChar and Unicode functions (MessageBoxW etc.), but application code is nearly finished, converting is a lot of work...
How to make character encoding correctly with the least work and time?
Maybe changing system code page for non-Unicode program?
First, of course: what code set is MySQL returning? Or perhaps:
what code set was used when writing the data into the data base?
Other than that, I don't think you'll be able to avoid using
either wide characters or multibyte characters: for single byte
characters, German would use ISO 8859-1 (code page 1252) or
ISO 8859-15, Polish ISO 8859-2 (code page 1250). But what are
you doing with the characters in your own code? You may be able
to get away with UTF-8 (code page 65001), without many changes.
The real question is where the characters originally come from
(although it might not be too difficult to translate them into
UTF-8 immediately at the source); I don't think that Windows
respects the code page for input.
Although it doesn't help you much to know it, you're dealing
with an almost impossible problem, since so much depends on
things outside your program: things like the encoding of the
display font, or the keyboard driver, for example. In fact,
it's not rare for programs to display one thing on the screen,
and something different when outputting to the printer, or to
display one thing on the screen, but something different if the
data is written to a file, and read with another program. The
situation is improving—modern Unix and the Internet are
gradually (very gradually) standardizing on UTF-8, everywhere
and for everything, and Windows normally uses UTF-16 for
everything that is pure Windows (but needs to support UTF-8 for
the Internet). But even using the platform standard won't help
if the human client has installed (and is using) fonts which
don't have the characters you need.
During working on one c++ project we decided to use MongoDB database for storing some data of our application. I have spent a week linking and compiling c++ driver, and its works now. But it is one trouble: strings like
bob.append("name", "some text with cyrilic symbols абвгд");
are added incorrectly and after extracting from database look like 4-5 chinese symbols.
I have found no documentation about unicode using in mongodb, so I can not understand how to write unicode to database.
Your example, and the example code in the C++ tutorial on mongodb.org work fine for me on Ubuntu 11.10. My locale is en_US.UTF-8, and I the source files I create are UTF-8.
MongoDB stores data in BSON, and BSON strings are UTF-8, and UTF-8 can handle any Unicode character (including Cyrillic). I think the C++ API assumes strings are UTF-8 encoded, but I'm not sure.
Here are some ideas:
If your code above (bob.append("name"... etc) is in a C++ source code file, try encoding that file as UTF-8.
Try inserting Unicode characters via the mongodb shell.
I've a non-unicode application which is using unicode versions of the ini reading functions like GetPrivateProfileSectionW and GetPrivateProfileStringW. The program is working well when "Language for non-unicode programs" is set to English.
When I change this setting to Chinese (PRC), the functions GetPrivateProfileSectionW and GetPrivateProfileStringW return null.
I must keep this setting at Chinese, because when English is selected for "Language for non-unicode programs", CComBSTR.LoadString is not working as expected, it loads the Chinese characters in a resource DLL as question marks.
Any ideas?
Thanks.
Michael Kaplan explains. The solution is to use Unicode INI files, which don't depend on the "Language for non-unicode programs".
The "Language for non-Unicode programs" also selects the default code page used for files. US English is usually Windows-1252. Chinese will be something different, like GB2312 or GBK. Open your .INI file with Notepad and save it with the "ANSI" format, which will be whatever Microsoft's default for the non-Unicode language selected.
Code:
mysqlpp::Query acc_query = connection->query("SELECT * FROM accounts;");
The following code produces:
_Gfirst = 0x00c67718 "SELECT * FROM accounts;ээээ««««««««юоюою"
As in Visual Studio debugger. It appears to cause my query to fail with weird results.
Has anyone else encountered it?
It's best to use UTF-8 encoding with MySQL. Code pages are a Windows-centric pre-Unicode concept. Your use of them instead of Unicode probably explains why you're having problems. While it's possible to make MySQL — and thus MySQL++ — work with Windows-style code pages, you shouldn't be doing that in 2010.
If you are using Unicode, it's probably UTF-16 encoding (Windows' native encoding in the NT derivatives), which again explains a lot.
Convert all string data into UTF-8 form before sending it to MySQL, and configure MySQL to use UTF-8 encoding in its tables.