Mongodb c++ driver: string encoding - c++

During working on one c++ project we decided to use MongoDB database for storing some data of our application. I have spent a week linking and compiling c++ driver, and its works now. But it is one trouble: strings like
bob.append("name", "some text with cyrilic symbols абвгд");
are added incorrectly and after extracting from database look like 4-5 chinese symbols.
I have found no documentation about unicode using in mongodb, so I can not understand how to write unicode to database.

Your example, and the example code in the C++ tutorial on mongodb.org work fine for me on Ubuntu 11.10. My locale is en_US.UTF-8, and I the source files I create are UTF-8.
MongoDB stores data in BSON, and BSON strings are UTF-8, and UTF-8 can handle any Unicode character (including Cyrillic). I think the C++ API assumes strings are UTF-8 encoded, but I'm not sure.
Here are some ideas:
If your code above (bob.append("name"... etc) is in a C++ source code file, try encoding that file as UTF-8.
Try inserting Unicode characters via the mongodb shell.

Related

How do I read unicode characters from SQL server using SQLAPI++?

I'm using SQLAPI++ to build a backend application which needs to access a database (SQL Server 2014). When I try to read a string (nvarchar(50)) from a result set that contains non-ascii characters (specifically persian characters), the cmd.Field("MyField").asString().GetxxxChars() methods all return question marks (?) instead of those characters.
What should I do?
I have also tried asBytes() and asLongChar() and got the same results.
So I found the problem!
I was linking with sqlapis.lib. I checked this link and found out that I should use sqlapius.lib instead for unicode support. I also needed to define SA_UNICODE.
It is now working fine.

Qt internationalization from native language

I am going to write software in Qt. Its string literals should be written in native (non-English) language and they should support internationalization. The Qt docs advice to use tr() function for this http://doc.qt.io/qt-5/i18n-source-translation.html
So I try to write:
edit->setText(tr("Фильтр"));
and and I can see only question marks in running app
I replace it with QString::fromStdWString
edit->setText(QString::fromStdWString(L"Фильтр"));
and I can see correct text in my language
So the question is: How should I write non-ASCII strings to be able to correctly display them and translate using Qt Linguist
PS: I use UTF8 encoding for all source files, compiler is vs2013
PS2: I have found QTextCodec::setCodecForTr() function.. but It was removed from Qt 5.4
I think that the best option is to use some kind of Latin1 transliteration inside program source. Then, it's possible to implement both Russian and English versions as normal Qt translations.
BTW. It's possible, with some additional work, to use even plain numbers as translation-placeholders. Just like MFC did.
I found the strange solution for my problem:
By default VS saves files in UTF8 with BOM. In File -> Advanced save options I choose to save file in UTF without BOM and everything works like a charm:
edit->setText(tr("Фильтр"));
It looks like a VS compiler bug.. Interestingly MS claims that its compiler support Unicode only for UTF8 with BOM https://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx
PS: length of "Фильтр" is 12 bytes, so it is really utf8 string

Japanese characters are not written correctly when saving to a file

I have a .NET based Excel addin that uses a C++/CLI library to read/write proprietary files. The C++/CLI library links to some core C++ libraries that provide classes to read and write these files. The core classes use std::string and std::i/ofstream to read/write data in proprietary files.
So when saving data, it goes from:
Excel >> .NET AddIn (string) >> C++/CLI Lib (System::String) >> C++ Core Lib (std::string)
All works fine with simple text (ASCII) files. Now I have a text file (ANSI encoding) with some Japanese characters in it saved on a Japanese machine. I think it uses the SHIFT-JIS encoding by default. This file LOADS fine (I see the characters in Excel same as I see in Notepad) but if I save it back unmodified then the character changes to ??. I think its because the std::string and std::ofstream classes are writing it incorrectly as simple ASCII stream.
I use the following syntax while reading the file to convert them to .NET strings:
%String(mystring.c_str());
and the following while converting them from .NET strings to std::strings while writing:
msclr::interop::marshal_as<std::string>(mydotnetstring)
The problem seems to me with encoding but I am not crystal clear on what exactly is happening. I want to understand WHY the file is READ CORRECTLY but not written correctly?
I have modified my application to read/write UTF-8 and that solves the problem but I still want to know the underlying problem.
Okay, I think I have found the underlying problem. The problem is that the msclr::interop::marshal_as< std::string > method calls WideCharToMultiByte API internally with CP_THREAD_ACP option which means that the CodePage of active THREAD is used. This .NET addin runs inside the Excel process and the current thread has a different CodePage (952 on Japanese system) than the Default CodePage (1252). I verified this by checking the return value of marshal_as call in a sample application vs the .NET addin on a Japanese machine. The sample application was converting a two Japanese character string to 4 bytes whereas the addin was just converting it to 2 unknown '?' bytes.
SOLUTION
marshal_as does not provide an option to change this option so the solution is to marshal .NET strings by directly using the WideCharToMultiByte API with CP_ACP option. It worked for me.

Using an ini file without Unicode

Is there any provision in WinAPI or otherwise for using ini files (or similar style config files) without having to use LPCWSTRs for most things?
My app is using single width ASCII strings throughout, and I've just got round to reading the ini file. Unicode strings are proving to be difficult to deal with and convert between.
If I can't find something fairly simple I think I will just use fstream and be done with it.
.INI files are very old stuff. They were existing decades before the Unicode was introduced. They are simple ASCII files. Tons of applications (including mine) are working with them using simple ASCII Api like GetPrivateProfileString.
If your application uses Unicode default, you can write explicitly GetPrivateProfileStringA. This will force all its params to be simple strings.

Codepage related problems with MySQL++

Code:
mysqlpp::Query acc_query = connection->query("SELECT * FROM accounts;");
The following code produces:
_Gfirst = 0x00c67718 "SELECT * FROM accounts;ээээ««««««««юоюою"
As in Visual Studio debugger. It appears to cause my query to fail with weird results.
Has anyone else encountered it?
It's best to use UTF-8 encoding with MySQL. Code pages are a Windows-centric pre-Unicode concept. Your use of them instead of Unicode probably explains why you're having problems. While it's possible to make MySQL — and thus MySQL++ — work with Windows-style code pages, you shouldn't be doing that in 2010.
If you are using Unicode, it's probably UTF-16 encoding (Windows' native encoding in the NT derivatives), which again explains a lot.
Convert all string data into UTF-8 form before sending it to MySQL, and configure MySQL to use UTF-8 encoding in its tables.