C++ problems when copying text data between ActiveDirectory and Sqlite3 - c++

I wrote a C++ program to retrieve some text data from MS-Active Directory and save them into a Sqlite3 database, however I have a problem, the utf-8 encoding.
According to some readings, data from active directory is UTF-8 encoded, but when reading from C++, it treats as a “wide char” (wchar_t) that Sqlite3 (which is utf-8 default) does not accept as UTF-8 because it uses only “char” into the parameter of its “sqlite3_bind_text” unless I use sqlite3_bind_text16, but I do not wish to do it because it increases the size of database.
I tried to convert from "wchar_t" to "char" using the function "wcstombs_s", but resulting data are not correct.
I read that the only way would be to use MultiByteToWideChar or WideCharToMultiByte, but I didn’t give a try because I read that the cost of convertion is quite expensive.
I would like to know if anybody of you had a similar situation and found a clean and effective solution for this matter.
Many Thanks!

Using sqlite3_bind_text16 on a database created with the UTF-8 encoding doesn't increase its size, the strings are converted on the fly to UTF-8.
See the paragraph "Support for UTF-8 and UTF-16" on this page.

Related

OpenCV imread with foreign characters

We're working on a project using OpenCV 2.4.6 and Qt 5.1.1 in C++. We have to load images for image processing at several points in our code, which we did using cv::imread, as normal. However, we wanted to make software compatible with other language filesystems, and found that having file paths with foreign characters would fail to load.
The problem, we believe, has to do with the fact that imread can only take in a std::string (or char*), and casting a path with non Latin-1 symbols to a std::string results in characters that use multiple bytes in UTF-8 (the encoding used for QString, which is how we store the paths) being converted to multiple chars.
To confirm that the file paths are valid, we've opened them by passing a wstring to a regular ifstream, which successfully opens the file and reads bits. Our current hack is to load the image as a QImage and then copy the data to a cv::Mat, but this isn't a satisfying solution, for multiple reasons, chiefly that as we understand it, Qt::QImage loads images in 8bit format, and our images are of a higher bit depth.
Is there any "clean" way to get around this? I saw this questions, but toAscii is deprecated, and its replacements didn't work for us. We've tried the following ways of converting the QString to a std::string and passing them to imread.
QString::toStdString(), QString::toUtf8().data(), QString::toLocal8Bit().data(), QString::toLatin1().data(). They all appear to yield roughly the same results.
Thanks in advance.
You can try QString::toStdWString() and then convert the resulting std::wstring to std::string.

Mongodb c++ driver: string encoding

During working on one c++ project we decided to use MongoDB database for storing some data of our application. I have spent a week linking and compiling c++ driver, and its works now. But it is one trouble: strings like
bob.append("name", "some text with cyrilic symbols абвгд");
are added incorrectly and after extracting from database look like 4-5 chinese symbols.
I have found no documentation about unicode using in mongodb, so I can not understand how to write unicode to database.
Your example, and the example code in the C++ tutorial on mongodb.org work fine for me on Ubuntu 11.10. My locale is en_US.UTF-8, and I the source files I create are UTF-8.
MongoDB stores data in BSON, and BSON strings are UTF-8, and UTF-8 can handle any Unicode character (including Cyrillic). I think the C++ API assumes strings are UTF-8 encoded, but I'm not sure.
Here are some ideas:
If your code above (bob.append("name"... etc) is in a C++ source code file, try encoding that file as UTF-8.
Try inserting Unicode characters via the mongodb shell.

C++ Unicode Encryption Library Required (Or is it?)

I need to encryption several pieces of text in a file along side unencrypted text in the same file. All the data is Unicode text.
In all the encryption libraries I have looked at Crypto++ Botan Etc... None of them "appear" to provide Unicode aware methods for encrypting / decrypting data E.G. data can be passed in/out using char, string instead of wchar wstring. Does this matter? Just looking for some guidance.
Encryption libraries will use your data as a binary blob, not as characters. So it doesn't matter in what encoding the data is.
Encoding only affects interpretation of the data, not the data itself.
In other words: It doesn't matter
Encryption works at byte level. It always requires binary blob as an input. So It does not matter in which encoding you are using to interpret data.

How can I download a utf-8-encoded web page with libcurl, preserving the encoding?

Im trying to get libcurl to download a webpage that is encoded in UTF-8, which is working fine, except for the fact that it converts it to ASCII and screws up some of the characters. Is there an easy way to get it to keep it in UTF-8?
libcurl doesn't translate/convert the data at all so there's actually nothing particular you need to do. Just get it.
Check the CURL options for conversion. They might have been defined at compilation time.

How can I get Django to output bad characters instead of returning an error

i've got some weird characters in my database, which seem to mess up django when returning a page. I get this error come up:
TemplateSyntaxError at /search/legacy/
Caught an exception while rendering: Could not decode to UTF-8 column 'maker' with text 'i� G�r'
(the actual text is slightly different, but since it is a company name i've changed it)
how can i get django to output this text? i'm currently running the site from sqlite (fast dev), is this the issue?
Also, on a completely unrelated note, is it possible to use a database view?
thanks
Probably not.
Django is using UTF-8 Strings internally, and it seems that your database returns some invalid string. You should fix the data in the database and use exclusively UTF-8 in all your application (data import, database, templates, source files, ...).
I have a related problem with a site owner who uses Apple's iPages for article creation, then does a copy-paste into a Django admin textbox. This process creates 'funny characters' that screw up Django and/or MySQL (you wouldn't believe the number of different double-left/right quote characters there are). I can't 'fix' the customer so I have a function that looks for known strangeness and translates it to something useful before. A complete PITA.
That's a bit of a confusing error message, and without knowing more details I'm not clear what the source of the problem is (the error message phrasing "decode to UTF-8" seems wrong, as normally you would encode to UTF-8). Perhaps Django is expecting to find data in some other encoding and is trying to decode it and re-encode as UTF-8, but is choking on some characters that aren't valid for the encoding it's expecting?
In general, you want to make sure that you're storing UTF-8 in your database, and that internally you're using unicode objects (not str objects) everywhere in your code.
Some other reading that may be helpful:
Unicode in the real world
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Django Tips: UTF-8, ASCII Encoding Errors, Urllib2, and MySQL