POSIX locale for different unicode scripts to use with Boost

POSIX locale for different unicode scripts to use with Boost - c++

I am using Boost Ofstream for Unicode std::string output. I am stuck with using the right locale (Boost)/ Encoding to support all the languages (in all the Unicode versions upto 6.3). The code is targeted to compile on both VS2010 and GCC 4.8
loc::generator gen;
std::locale _loc= gen.generate("en_US.utf-8"); // use the right POSIX locale/ encoding,
// to support different versions of Unicode
// and with different compilers
std::string str = "my unicode string";
boost::filesystem::ofstream _file("my file.txt");
_file.imbue(_loc);
_file<<str;
I am trying to understand the different Unicode versions, encodings, locale support from different compilers.

Related

Cyrillic characters are not saved in char16_t under msvc with cmake build

I faced the following problem: when I try to save a Cyrillic character in the char16_t variable, I get a c2015 error (too many characters in constant). The most interesting thing is that it happens only when I build with MSVC via cmake. If I do it directly through studio, or build with cmake with another compiler, it's ok. Moreover, in the same case, which does not work with MSVC, it is possible to save Cyrillic characters in wchar_t. I use version 16 of MSVC generator to generate it.
Example of broken code:
char16_t val = u'б';
EDIT:
It is trouble with utf literals. I`m use this:
char16_t val = some_int_val
And this work correctly. Also, IDE correctly print int value of literal, but on the compile all crushs.

C++ Use of wstring_convert on Linux

I would like to be able to convert text read from a file into multibyte characters.
I have the following C++ code on Windows that is working for me.
When I try to compile the code on Linux though it is failing.
#include <locale>
....
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> utfconv;
std::string line;
while (std::getline(infile, line))
{
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> utfconv;
std::wstring widestr = utfconv.from_bytes(line);
This throws the following errors:
wstring_convert is not a member of std
codecvt_utf8_utf16 is not a member of std
expected primary-expression before wchar_t
I'm using GCC Red Hat 4.4.7-4.
Based on what I read, I've imported 'locale', but it still can't find it.
If wstring_convert is not available, is there something equivalent that I can do?

Most likely your standard is not set properly. std::wstring_convert was first introduced in C++11 and deprecated in C++17, so you need to add the compiler flag -std=c++11 or -std=c++14.
EDIT:
Just saw the GCC version you're using. It's way out of date. Everything should work fine if you download GCC 4.9.x or above

You will have to use a newer GCC version. Precompiled versions are part of Developer Toolset (available as part of most Red Hat Enterprise Linux subscriptions) or as part of Software Collections for CentOS:
devtoolset-7

Set execution character set for Visual C++ compiler

Is it possible to set the execution character set for Visual C++ compiler?
Problem
When trying to convert an (UCN) string literal to a wide string, a runtime crash happens when using Visual Studio 2015 for compilation:
std::string narrowUCN = "\u00E4\u00F6\u00FC\u00DF\u20AC\u0040";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convertWindows;
std::wstring wide = convertWindows.from_bytes(narrowUCN); // Unhandled C++ exception in xlocbuf, line 426.
Using narrowUCN = u8"\u00E4\u00F6\u00FC\u00DF\u20AC\u0040" works so I assume a problem with the execution character set?

Since Visual Studio 2015 Update 2, it is possible to set the execution character set to UTF-8 using the compiler option /utf-8. Then the conversion of narrow string literals, that don't use u8, will work. This is because those string literals are then converted to UTF-8 instead of the system's codepage (which is the default behaviour of Visual C++ compiler).
The option /utf-8 is a synonym for /source-charset:utf-8 and /execution-charset:utf-8. From the link above:
In those cases where BOM-less UTF-8 files already exist or where changing to a BOM is a problem, use the /source-charset:utf-8 option to correctly read these files.
Use of /execution-charset or /utf-8 can help when targeting code between Linux and Windows as Linux commonly uses BOM-less UTF-8 files and a UTF-8 execution character set.
PS: Don't confuse this with the character set setting in the common project configuration page which only sets the Macros Unicode/MBCS (historical reasons).

Apple C++ LLVM Compiler 4.x & UNICODE: when needed? Is UNICODE default compiler charset? Making your code compiling both ANSI and UNICODE versions

I have Microsoft C++ compiler experience.
There you could adjust your using/not using UNICODE compilation path very simply.
Following constructions were legitimate and perfectly possible:
#ifdef UNICODE
typedef std::wstring string;
#else
typedef std::string string;
#endif
But how can I handle the same situation with Apple LLVM compiler?
P.S. GCC hints will also be appreciated.
UPDATE:
In Windows programming it is better to use UNICODE strings (especially, if you heavily work with WinAPI, which is UNICODE based). Are there any reasons to use wstring instead of string (except charset differences) on LLVM or GCC for OSX and iOS?

It's arguable that you should even care about supporting multiple types of strings (it depends on the application), but perhaps the following should work:
#if defined(_WIN32) && defined(UNICODE)
typedef std::wstring string;
#else
typedef std::string string;
#endif
Also, read the following post to learn all about the different types of strings and their use cases: std::wstring VS std::string

clang++ and u16string

I'm having a hell of a time with this simple line of code and the latest clang++
#include <stdio.h>
#include <string>
using std::u16string;
int main ( int argc, char** argv )
{
u16string s16 = u"鵝滿是快烙滴好耳痛";
return EXIT_SUCCESS;
}
Ben-iMac:Desktop Ben$ clang++ -std=c++0x -stdlib=libc++ main.cpp -o main
main.cpp:15:21: error: use of undeclared identifier 'u'
u16string s16 = u"鵝滿是快烙滴好耳痛"

The latest released versions of clang, v2.9 from llvm.org or Apple's clang 3.0, do not support Unicode string literals. The latest available version, built from top of trunk source does support Unicode string literals.
The next llvm.org release of clang (i.e., 3.0) will support the Unicode string literal syntax, but does not have support for any source file encoding beyond ASCII. So even with that llvm.org release you won't be able to type in those characters literally in your source and have them converted to a UTF-16 encoded string value. Instead you'll have to use the \u escapes. Again, top of trunk does support UTF-8 source code, but it didn't get put in in time for the llvm.org 3.0 release that is currently under testing. The next release after that (in 6 months or so) should have better support for UTF-8 source code (but not other source encodings).
Edit: The Xcode 4.3 version of clang does have these features.
Edit: And now the 3.1 release from LLVM.org has them
So clang now fully supports the following:
#include <string>
int main() {
std::u16string a = u"鵝"; // UTF-8 source is transformed into UTF-16 literal
std::u32string b = U"滿"; // UTF-8 source is transformed into UTF-32 literal
}
It turns out the standard does not actually require much support for char16_t and char32_t in the iostreams library, so you'll probably have to convert to another string type to get much use out of this. At least the ability to convert between these and the more useful std::string is required (though not exactly convenient to set up...).

You can test clang for individual C++11 features, e.g.:
http://clang.llvm.org/docs/LanguageExtensions.html#cxx_unicode_literals
and here's a status page:
http://clang.llvm.org/cxx_status.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js