xlocale broken on OS X?

xlocale broken on OS X? - c++

I have a simple program that tests converting between wchar_t and char using a series of locales passed to it on the command line. It outputs a list of the conversions that fail by printing out the locale name and the string that failed to convert.
I'm building it using clang and libc++. My understanding is that libc++'s named locale support is provided by the xlocale library on OS X.
I'm seeing some unexpected failures, as well as some instances where conversion should fail, but doesn't.
Here's the program.
#warning call this program like: "locale -a | ./a.out" or pass \
locale names valid for your platform, one per line via standard input
#include <iostream>
#include <codecvt>
#include <locale>
#include <array>
template <class Facet>
class usable_facet : public Facet {
public:
// FIXME: use inheriting constructors when available
// using Facet::Facet;
template <class ...Args>
usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
~usable_facet() {}
};
int main() {
std::array<std::wstring,11> args = {L"a",L"é",L"¤",L"€",L"Да",L"Ψ",L"א",L"আ",L"✈",L"가",L"𐌅"};
std::wstring_convert<usable_facet<std::codecvt_utf8<wchar_t>>> u8cvt; // wchar_t uses UCS-4/UTF-32 on this platform
int convert_failures = 0;
std::string line;
while(std::getline(std::cin,line)) {
if(line.empty())
continue;
using codecvt = usable_facet<std::codecvt_byname<wchar_t,char,std::mbstate_t>>;
std::wstring_convert<codecvt> convert(new codecvt(line));
for(auto const &s : args) {
try {
convert.to_bytes(s);
} catch (std::range_error &e) {
convert_failures++;
std::cout << line << " : " << u8cvt.to_bytes(s) << '\n';
}
}
}
std::cout << std::string(80,'=') << '\n';
std::cout << convert_failures << " wstring_convert to_bytes failures.\n";
}
Here are some examples of correct output
en_US.ISO8859-1 : €
en_US.US-ASCII : ✈
Here's an example of output that is not expected
en_US.ISO8859-15 : €
The euro character does exist in the ISO 8859-15 charset and so this should not be failing.
Here are examples of output that I expect but do not receive
en_US.ISO8859-15 : ¤
en_US.US-ASCII : ¤
This is the currency symbol that exists in ISO 8859-1 but was removed and replaced with the euro symbol in ISO 8859-15. This conversion should not be succeeding, but no error is being signaled. When examining this case further I find that in both cases '¤' is being converted to 0xA4, which is the ISO 8859-1 representation of '¤'.
I'm not using xlocale directly, only indirectly via libc++. Is xlocale on Mac OS X simply broken with bad locale definitions? Is there a way to fix it? Or are the issues I'm seeing a result of something else?

I suspect you are seeing problems with the xlocale system. A bug report would be most appreciated!

I don't know why you're expecting wchar_t to be UTF-32 or where you heard that "OS X's convention that wchar_t is UTF-32." That is certainly incorrect. wchar_t are only 16 bits wide.
See http://en.wikipedia.org/wiki/Wide_character for more information about wchar_t.

Related

Is there a way to ensure lazy evaluation of BOOST_TESTs message?

Is it possible to ensure that the message macro parameter of BOOST_TEST is only evaluated, if the check actually fails. Lazy evaluation seems to happen for human readable output format, but not for JUnit output format. Can I somehow make lazy evaluation work reliably for all output types?
MCVE
#define BOOST_TEST_MODULE my_module
#include <boost/test/included/unit_test.hpp>
#include <string>
struct S
{
std::string m_value;
};
S* f(void)
{
return nullptr;
}
BOOST_AUTO_TEST_CASE(foo_test)
{
S* result = f();
BOOST_TEST(result == nullptr, "f() should return nullptr but returned a object with m_value = " << result->m_value);
}
This code works fine if I use human readable output format (--log_format=HRF command line option for executable). Using JUnit output (--log_format=JUNIT command line option) results in a access error, since the program attempts to get the size of a string at address 0.
This is a bit unexpected, since I'd assume the BOOST_TEST macro would work similar to this
// not the real macro, but one that works in a way I'd expect it to work
#define BOOST_TEST(condition, message) if (!(condition)) { \
some_output_stream << "The following test failed: " << #condition << std::endl \
<< "Message: " << message; \
}
Tested configurations:
boost 1.75.0
Visual Studio 2017/2019 or g++ 8.x (Win 10 / Ubuntu 18)
64 bit

How does "C++filt" find the actual typenames?

Sorry if this is silly but I am puzzled by this.
I did Google my question but was not able to find any relevant results.
For the below code compiled and written to a.out :
char x;
cout<<typeid(x).name()<<endl;
/a.out gave me c where I expected char. I found this SO question and concluded that we need to demangle the result using c++filt -t and so I did this :
./a.out | c++filt -t
and hurray ! I got the demangled (This is not a dictionary word) name char.
Fair enough !
But the question that perplexes me is how did c++filt find that?
I doubled check what a pipe does here. If I understand correctly it just passes the output, in this case, c to the c++filt -t.
Where did c++filt look for the information?
How the process of demangling work with c++filt?

Where did c++filt look for the information?
How the process of demangling work with c++filt?
The way mangling works depends on what your platform is. And therefore the way the demangling works also depends on it.
The programmers of c++filt looked at the specification that describes how symbols are mangled on your platform. Or, possibly, they simply call a function that's provided by the implementation, which demangles the symbols.
In the latter case, the people who implemented the compiler and therefore the demangling function, know how the symbols are mangled because they implemented the mangling in the first place.
c++filt is open source software, you can read their source to find out what it does.
If you're interested in how to demangle symbols yourself, I recommend taking a look at the manual of your compiler.

As it is written in the man :
c++filt copies each file name in sequence, and writes it on
the standard output after decoding symbols that look like
C++ mangled names.
c++filt handles Solaris Studio C++ legacy versions as well
as the current version.
c++filt reads from the standard input if no input file is
specified.
And here is a little introduction about demangling :
https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
#include <exception>
#include <iostream>
#include <cxxabi.h>
struct empty { };
template <typename T, int N>
struct bar { };
int main()
{
int status;
char *realname;
// exception classes not in <stdexcept>, thrown by the implementation
// instead of the user
std::bad_exception e;
realname = abi::__cxa_demangle(e.what(), 0, 0, &status);
std::cout << e.what() << "\t=> " << realname << "\t: " << status << '\n';
free(realname);
// typeid
bar<empty,17> u;
const std::type_info &ti = typeid(u);
realname = abi::__cxa_demangle(ti.name(), 0, 0, &status);
std::cout << ti.name() << "\t=> " << realname << "\t: " << status << '\n';
free(realname);
return 0;
}
This prints
St13bad_exception => std::bad_exception : 0
3barI5emptyLi17EE => bar<empty, 17> : 0

Can't insert a number into a char16_t-based custom C++ ostream/streambuf

I have written a custom std::basic_streambuf and std::basic_ostream because I want an output stream that I can get a JNI string from in a manner similar to how you can call std::ostringstream::str(). These classes are quite simple.
namespace myns {
class jni_utf16_streambuf : public std::basic_streambuf<char16_t>
{
JNIEnv * d_env;
std::vector<char16_t> d_buf;
virtual int_type overflow(int_type);
public:
jni_utf16_streambuf(JNIEnv *);
jstring jstr() const;
};
typedef std::basic_ostream<char16_t, std::char_traits<char16_t>> utf16_ostream;
class jni_utf16_ostream : public utf16_ostream
{
jni_utf16_streambuf d_buf;
public:
jni_utf16_ostream(JNIEnv *);
jstring jstr() const;
};
// ...
} // namespace myns
In addition, I have made four overloads of operator<<, all in the same namespace:
namespace myns {
// ...
utf16_ostream& operator<<(utf16_ostream&, jstring) throw(std::bad_cast);
utf16_ostream& operator<<(utf16_ostream&, const char *);
utf16_ostream& operator<<(utf16_ostream&, const jni_utf16_string_region&);
jni_utf16_ostream& operator<<(jni_utf16_ostream&, jstring);
// ...
} // namespace myns
The implementation of jni_utf16_streambuf::overflow(int_type) is trivial. It just doubles the buffer width, puts the requested character, and sets the base, put, and end pointers correctly. It is tested and I am quite sure it works.
The jni_utf16_ostream works fine inserting unicode characters. For example, this works fine and results in the stream containing "hello, world":
myns::jni_utf16_ostream o(env);
o << u"hello, wor" << u'l' << u'd';
My problem is as soon as I try to insert an integer value, the stream's bad bit gets set, for example:
myns::jni_utf16_ostream o(env);
if (o.badbit()) throw "bad bit before"; // does not throw
int32_t x(5);
o << x;
if (o.badbit()) throw "bad bit after"; // throws :(
I don't understand why this is happening! Is there some other method on std::basic_streambuf I need to be implementing????

It looks like the answer is that char16_t support is only partly implemented in GCC 4.8. The library headers don't install facets needed to convert numbers. Here is what the Boost.Locale project says about it:
GNU GCC 4.5/C++0x Status
GNU C++ compiler provides decent support of C++0x characters however:
Standard library does not install any std::locale::facets for this
support so any attempt to format numbers using char16_t or char32_t
streams would just fail. Standard library misses specialization for
required char16_t/char32_t locale facets, so "std" backends is not
build-able as essential symbols missing, also codecvt facet can't be
created as well.
Visual Studio 2010 (MSVC10)/C++0x Status
MSVC provides all required facets however:
Standard library does not provide installations of std::locale::id for
these facets in DLL so it is not usable with /MD, /MDd compiler flags
and requires static link of the runtime library. char16_t and char32_t
are not distinct types but rather aliases of unsigned short and
unsigned types which contradicts to C++0x requirements making it
impossible to write char16_t/char32_t to stream and causing multiple
faults.

c++ code won't compile with GCC because of typeid().raw_name() - how can I fix this?

The following code compiles fine on Windows with Visual Studio:
class_handle(base *ptr) : ptr_m(ptr), name_m(typeid(base).raw_name()) { signature_m = CLASS_HANDLE_SIGNATURE; }
If I try to compile the same code on Linux I get:
error: ‘const class std::type_info’ has no member named ‘raw_name’
as far as I understand, raw_name is a Microsoft specific implementation. How do I have to change my code so it compiles both on Windows and Linux systems?
EDIT1 I prefer to not modify the original code, I just need a workaround to compile with gcc. Is that possible?
EDIT2 will #define raw_name name do the trick?

Write these:
// for variables:
template<typename T>
char const* GetRawName( T unused ) { ... }
// for types:
template<typename T>
char const* GetRawName() { ... }
with different implementation on Windows and not-on-Windows using an #ifdef block on a token you know to be defined in the microsoft compiler, but not in your other compiler. This isolates the preprocessing differences between MS and non-MS compiled versions to an isolated file.
This does require a minimal amount of change to the original code, but does so in a way that will still compile on the microsoft compiler.

It's probably safer to #define typeid:
class compat_typeinfo {
const std::type_info &ti;
public:
explicit compat_typeinfo(const std::type_info &ti): ti(ti) {}
const char *name() const { return ti.name(); }
const char *raw_name() const { return ti.name(); }
};
compat_typeinfo compat_typeid(const std::type_info &ti) {
return compat_typeinfo(ti);
}
#define typeid(x) compat_typeid(typeid(x))
Of course, this is illegal by 17.6.4.3.1p2 (A translation unit shall not #define or #undef names lexically identical to keywords [...]) but it's reasonably likely to work and requires minimal modification elsewhere.

GCC doesn't define raw_name but does include mangling/demangling in cxxabi.h. You can see an example of it here.
#include <cxxabi.h>
//...
std::bad_exception e;
realname = abi::__cxa_demangle(e.what(), 0, 0, &status);
std::cout << e.what() << "\t=> " << realname << "\t: " << status << '\n';
free(realname);

How to workaround gcc-3.4 bug (or maybe this is not a bug)?

Following code fails with a error message :
t.cpp: In function `void test()':
t.cpp:35: error: expected primary-expression before '>' token
t.cpp:35: error: expected primary-expression before ')' token
Now I don't see any issues with the code and it compiles with gcc-4.x and MSVC 2005 but not with gcc-3.4 (which is still quite popular on some platforms).
#include <string>
#include <iostream>
struct message {
message(std::string s) : s_(s) {}
template<typename CharType>
std::basic_string<CharType> str()
{
return std::basic_string<CharType>(s_.begin(),s_.end());
}
private:
std::string s_;
};
inline message translate(std::string const &s)
{
return message(s);
}
template<typename TheChar>
void test()
{
std::string s="text";
std::basic_string<TheChar> t1,t2,t3,t4,t5;
t1=translate(s).str<TheChar>(); // ok
char const *tmp=s.c_str();
t2=translate(tmp).str<TheChar>(); // ok
t3=message(s.c_str()).str<TheChar>(); // ok
t4=translate(s.c_str()).str<TheChar>(); // fails
t5=translate(s.c_str()).template str<TheChar>(); // ok
std::cout << t1 <<" " << t2 <<" " << t3 << " " << t4 << std::endl;
}
int main()
{
test<char>();
}
Is it possible to workaround it on the level of translate function and message class, or maybe my code is wrong, if so where?
Edit:
Bugs related to template-functions in GCC 3.4.6 says I need to use keyword template but should I?
Is this a bug? Do I have to write a template keyword? Because in all other cases I do not have to? And it is quite wired I do not have to write it when I use ".c_str()" member function.
Why gcc-4 not always an option
This program does not starts when compiled with gcc-4 under Cygwin
#include <iostream>
#include <locale>
class bar : public std::locale::facet {
public:
bar(size_t refs=0) : std::locale::facet(refs)
{
}
static std::locale::id id;
};
std::locale::id bar::id;
using namespace std;
int main()
{
std::locale l=std::locale(std::locale(),new bar());
std::cout << has_facet<bar>(l) << std::endl;
return 0;
}
And this code does not compiles with gcc-4.3 under OpenSolaris 2009- broken concepts checks...
#include <map>
struct tree {
std::map<int,tree> left,right;
};

As mentioned elsewhere, that seems to be a compiler bug. Fair enough; those exist. Here's what you do about those:
#if defined(__GNUC__) && __GNUC__ < 4
// Use erroneous syntax hack to work around a compiler bug.
t4=translate(s.c_str()).template str<TheChar>();
#else
t4=translate(s.c_str()).str<TheChar>();
#endif
GCC always defines __GNUC__ to the major compiler version number. If you need it, you also get __GNUC_MINOR__ and __GNUC_PATCHLEVEL__ for the y and z of the x.y.z version number.

This is a bug in the old compiler. Newer GCC's, from 4.0 to (the yet unreleased) 4.5, accept it, as they should. It is standard C++. (Intel and Comeau accept it also.)
Regarding cygwin and opensolaris, of course gcc-3.4 is not the only option: the newer versions (the released 4.4.3, or the unreleased 4.5 branch) work fine on these OS'es. For cygwin, it's part of the official distribution (see the gcc4* packages in the list). For opensolaris, you can compile it yourself (and instructions on how to do so can easily be found with Google).

I would try to use a different workaround, since adding the template disambiguator there is incorrect and will break if you move to a different compiler later on.
I don't know the real code, but passing a regular std::string seems to work (option 1: avoid converting to const char * just to create a temporary) or you could provide an overloaded translate that takes a const char* as argument (if the compiler does not complain there), depending on your requirements.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

xlocale broken on OS X? - c++

I suspect you are seeing problems with the xlocale system. A bug report would be most appreciated!

I don't know why you're expecting wchar_t to be UTF-32 or where you heard that "OS X's convention that wchar_t is UTF-32." That is certainly incorrect. wchar_t are only 16 bits wide. See http://en.wikipedia.org/wiki/Wide_character for more information about wchar_t.

Related

Is there a way to ensure lazy evaluation of BOOST_TESTs message?

How does "C++filt" find the actual typenames?

Can't insert a number into a char16_t-based custom C++ ostream/streambuf

c++ code won't compile with GCC because of typeid().raw_name() - how can I fix this?

How to workaround gcc-3.4 bug (or maybe this is not a bug)?

Categories

Resources