unicode string in c++ with boost - c++

I want to use unicode string in c++ with any library which implements a lot of its routine. I want to work with the boost libraries. And I found locale library. But I did not find that a lot of people use it, don't they? What can you say from your experience about this library? Are there any other boost libraries which implements unicode string routine?
UPDATE:
There is a problem with the use of another libraries in some my modules. I don't want to tie them to a lot of different libraries (boost is ok), but I need a unicode string routine (mb class). Why unicode? Mb in some characters of the strings will appear japanese symbols or from other language. And they must be treated as english characters.

Please excuse the self promotion here, but you might be interested in the answer I wrote here: What are the tradeoffs between boost::locale and std::locale?, comparing boost::locale to std::locale.
Depending on what you need to do with your text, boost::locale is probably the best approach to adding unicode support to your c++ code. This is especially true if you need cross platform support, or you want to use UTF-8 on Windows.

Related

Unicode strings in pure C++

How to arrange correct processing of Unicode strings using pure C++?
What I mean is, when you put your unicode string into std::string and count its length, sometimes you get like 10 characters for 5-chars-long string.
How do they do it in serious open-source programs? How do they do it in a cross-platform manner? How do you tie it to file i/o and stdin/stdout streams?
Thanks.
There's Boost.Locale, which is written in C++, wraps the ICU library, and provides a nice, non-alien interface to it.
For Unicode work, my first choice would be Boost.Locale, followed by ICU directly (if there is something that Boost.Locale doesn't wrap yet).
std::[w]string, contrary to popular belief, has no Unicode support whatsoever. They both operate only on [w]char[_t] units, in an encoding agnostic way.
If you only need basic Unicode support in the form of length and conversions and encoding verification, there is utfcpp, which provides a beautiful C++ interface for these operations.
Application frameworks like Qt and wxWdigets do provide their own string classes, which offer better Unicode support, but often tying you to use the whole framework throughout your code.
Aside from that, there is ICU, which is the standard Unicode implementation around today.
A work in progress by one of the C++ masters on this website is ogonek. you can surely contact the author through the Lounge<C++> StackOverflow chat room to ask for details on his progress.
This is how: http://www.utf8everywhere.org
Have you checked http://site.icu-project.org already?
ICU is currently the Unicode library. If you want cross-platform Unicode support, ICU is basically the only place to get it.
If only its interface wasn't more unfriendly than the wrong end of an automatic shotgun.
I've used wxWidgets to do this. It makes for easy conversion from std::string to their string type wxString. It's not ideal, but it works well, is simple and portable.

Is there any library to determine whether a numerical value can be translated to a valid, printable, and meaningful character in a specific charset?

I hope there is any library out there that provides such functionality so that I do not need to dig too much on charset specification.
C++, and hopefully Chinese, and hopefully Windows.
Yes, ICU is a mature library providing Unicode and Globalization support. Among other things it provides easy access to all of the many Unicode character properties, Unicode Normalization, Case Folding and other fundamental operations as specified by the Unicode Standard.
I have not tried to program with it it myself, but in the Unix world the Gnu Library libiconv is very widely used. It is also available for Windows. It is probably a bit more slim then the ICU.

What is the best way to use Unicode in C++ on iPhone?

I want to create my C++ libraries with Unicode support so they can be reused on other platforms.
I have found the ICU (International Components for Unicode) project but I also found a discuss about Apple rejecting for using ICU.
So how do you guys use Unicode in C++ on iPhone? Thanks.
What do you want to use, in Unicode?
If you want to manipulate Unicode strings and format or parse things according to a locale, the standard APIs should be enough: std::wstring, std::locale, iconv(), etc…
The iPhone uses ICU internally. Check About»Legal.

How to do Latin1-UTF8 encoding change in C++ (maybe with Boost)?

My source base is mostly using UTF8, but some older library has Windows Latin1 encoded strings hardcoded within it.
I was hoping Boost would have a clear conversion feature, but I did not find such. Do I really need to hand-code such a commonplace solution?
Looking for a portable solution, running on Linux.
(This Q is similar, but not quite the same)
Edit: ICU seems to be the right answer, but it's a bit overkill for my needs. I ended up doing string-replace for the known few extended chars that were used.
International Components for Unicode (ICU) does have the solutions you are looking for. Boost can be compiled with support for ICU, e.g. for Boost regular expressions, but precompiled versions of Boost usually don't include it.

Unicode Processing in C++

What is the best practice of Unicode processing in C++?
Use ICU for dealing with your data (or a similar library)
In your own data store, make sure everything is stored in the same encoding
Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alpha unless that is the definition you want.
I can't say it enough: never iterate over the indices of a string if you care about correctness, always use your unicode library for this.
If you don't care about backwards compatibility with previous C++ standards, the current C++11 standard has built in Unicode support: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2011/n3242.pdf
So the truly best practice for Unicode processing in C++ would be to use the built in facilities for it. That isn't always a possibility with older code bases though, with the standard being so new at present.
EDIT: To clarify, C++11 is Unicode aware in that it now has support for Unicode literals and Unicode strings. However, the standard library has only limited support for Unicode processing and conversion. For your current needs this may be enough. However, if you need to do a large amount of heavy lifting right now then you may still need to use something like ICU for more in-depth processing. There are some proposals currently in the works to include more robust support for text conversion between different encodings. My guess (and hope) is that this will be part of the next technical report.
Our company (and others) use the open source Internation Components for Unicode (ICU) library originally developed by Taligent.
It handles strings, locales, conversions, date/times, collation, transformations, et. al.
Start with the ICU Userguide
Here is a checklist for Windows programming:
All strings enclosed in _T("my string")
strlen() etc. functions replaced with _tcslen() etc.
Use LPTSTR and LPCTSTR instead of char * and const char *
When starting new projects in Dev Studio, religiously make sure the Unicode option is selected in your project properties.
For C++ strings, use std::wstring instead of std::string
Look at
Case insensitive string comparison in C++
That question has a link to the Microsoft documentation on Unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx
If you look on the left-hand navigation side on MSDN next to that article, you should find a lot of information pertaining to Unicode functions. It is part of a chapter on "Encoding Characters" (http://msdn.microsoft.com/en-us/library/cc194786.aspx)
It has the following subsections:
The Code-Page Model
Double-Byte Character Sets in Windows
Unicode
Compatibility Issues in Mixed Environments
Unicode Data Conversion
Migrating Windows-Based Programs to Unicode
Summary
Although this may not be best practice for everyone, you can write your own C++ UNICODE routines if you want!
I just finished doing it over a weekend. I learned a lot, though I don't guarantee it's 100% bug free, I did a lot of testing and it seems to work correctly.
My code is under the New BSD license and can be found here:
http://code.google.com/p/netwidecc/downloads/list
It is called WSUCONV and comes with a sample main() program that converts between UTF-8, UTF-16, and Standard ASCII. If you throw away the main code, you've got a nice library for reading / writing UNICODE.
As has been said above a library is the best bet when using a large system. However some times you do want to handle things your self (maybe because the library would use to many resources like on a micro controller). In this case you want a simple library that you can copy the parts out of for the things you actually need.
Willow Schlanger's example code seems like a good one (see his answer for more details).
I also found another one that has smaller code, but lacks full error checking and only handles UTF-8 but was simpler to take parts out of.
Here's a list of the embedded libraries that seem decent.
Embedded libraries
http://code.google.com/p/netwidecc/downloads/list (UTF8, UTF16LE, UTF16BE, UTF32)
http://www.cprogramming.com/tutorial/unicode.html (UTF8)
http://utfcpp.sourceforge.net/ (Simple UTF8 library)
Use IBM's International Components for Unicode
Have a look at the recommendations of UTF-8 Everywhere