glibmm 2.4 invalid byte sequence in conversion input - c++

Trying to use Glibmm-2.4 to output Chinese returns "Invalid byte sequence in conversion input".
#include <iostream>
#include <glibmm/ustring.h>
#include <glibmm/convert.h>
int main()
{
Glib::ustring myUstring = "中";
try
{
std::cout << myUstring;
}
catch(Glib::ConvertError e)
{
std::cout << e.what();
}
return 0;
}
Any idea what I am doing wrong here?
Using Ubuntu 14.04.2 i386 LTS, g++, glibmm-2.4

An initialization of the locale is needed E.g. std::locale::global(std::locale("")); in order to use ustring and have it do the stream operator's implicit conversion correctly, otherwise characters outside of ASCII aren't converted correctly.
This is documented here

Related

how do i fix this no matching function for call to 'stoi(int&)'|

i keep getting this error. i know this is a c++ 11 function but it still isnt working with code blocks c++ compiler. am i using this function correctly of is it a problem with the codeblocks compiler. i tried changing the compiler. using the "have g++ follow the c++11 iso standard" i still keep getting this error. or getting the "stoi() does not exist in the current scope" error
#include <iostream>
#include <string>
using namespace std;
int main()
{
int test = 34;
cout << stoi(test);
}
stoi means "String To Int". It will read an int from a std::string (or std::wstring). See also the reference.
You were probably looking for the reverse std::to_string (reference). But you don't need either, there is no need to convert to string before printing:
#include <iostream>
int main()
{
int test = 34;
std::cout << test;
}
stoi means string to int. So it takes a string as an input.
This should work:
string test = "34"; cout << stoi(test);

basic_regex throws bad_cast with char32_t

Why does the following code generate std::bad_cast exception?
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::basic_string<char32_t> reg = U"^\\w";
try
{
std::basic_regex<char32_t> tagRegex(reg);
}
catch(std::exception &e)
{
std::cout << e.what() << std::endl;
}
return 0;
}
This sample on Ideone for convenience: https://ideone.com/Saea88
Using char or wchar instead of char32_t runs without throwing though (proof: https://ideone.com/OBlXed).
You can find here: http://en.cppreference.com/w/cpp/regex/regex_traits:
To use std::basic_regex with other character types (for example, char32_t), a user-provided trait class must be used.
so you would have to implement std::regex_traits<char32_t>
and to see why there is no definition for it see here: Why is there no definition for std::regex_traits<char32_t> (and thus no std::basic_regex<char32_t>) provided?
On GCC or Clang, the code compiles fine even with custom regex traits, but fails at runtime with std::bad_cast. If you've got yourself here, the issue comes from std::use_facet<std::ctype<char32_t>> throwing the error, because the current locale doesn't support it. You have to specialize std::ctype<char32_t> and set the global locale via std::locale::global to a new locale constructed using the old one and the specialized facet.

Is this simple C++ program using <locale> correct?

This code seemed to work ok in (ubuntu trusty) versions of gcc and clang, and in Win 7 on a VM via mingw... Recently I upgraded to Wily and builds made with clang crash consistently here.
#include <iostream>
#include <locale>
#include <string>
int main() {
std::cout << "The locale is '" << std::locale("").name() << "'" << std::endl;
}
Sometimes its a gibberish string followed by Aborted: Core dumped and sometimes its invalid free.
$ ./a.out
The locale is 'en_US.UTF-8QX�у�X�у����0�����P�����\�(��\�(��\�(��h��t�������������y���������ț�ԛ�������en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_UP����`�������������������������p�����������#��������������`�������������p��������������������#��#��#��`��������p������������0��P��p���qp��!en_US.UTF-8QЈ[�����\�(��\�(��\�(�����������#�� �����P�����0�����P�����\�(��\�(��\�(��Ȣ�Ԣ����������������(��4��#��L��en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8!�v[��������������#�� �����P�����0�����P�����\�(��\�(���(��h��t��������������������Ȥ�Ԥ�������en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8!��[�� ����[�������7����7��.,!!x�[��!��[��!�[��#�����������#�� �����P�����0�����P�����\�(��\�(��\�(��(��4��#��L��X��d��p��|������������n_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8ѻAborted (core dumped)
$ ./a.out
The locale is 'en_US.UTF-8QX\%�QX\%�Q�G�0H��H�PI��I�\:|�Q\D|�Q\>|�QhK�tK��K��K��K��K��Q�K��K��K��K��K��K�en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8en_US.UTF-8ѻ
*** Error in `./a.out': free(): invalid pointer: 0x0000000000b04a98 ***
Aborted (core dumped)
(Both program outputs above were abbreviated greatly or they would not fit in this question.)
I also got an invalid free on Coliru with it as well.
But this is very similar to example code on cppreference:
#include <iostream>
#include <locale>
#include <string>
int main()
{
std::wcout << "User-preferred locale setting is " << std::locale("").name().c_str() << '\n';
// on startup, the global locale is the "C" locale
std::wcout << 1000.01 << '\n';
// replace the C++ global locale as well as the C locale with the user-preferred locale
std::locale::global(std::locale(""));
// use the new global locale for future wide character output
std::wcout.imbue(std::locale());
// output the same number again
std::wcout << 1000.01 << '\n';
}
Actually that code crashes Coliru also... :facepalm:
More crashes of similar code from Coliru.
Is this a bug in the c++ library used by clang, or is this code defective?
Note also: These crashes seem to be restricted to the C++ api, if you use <clocale> instead things seem to work okay, so it may just be some trivial problem in the C++ bindings over this?
Variations using setlocale: 1 2 3
Looks like this is caused by libstdc++'s ABI change in its basic_string, which was needed for C++11 conformance. To manage this transition, GCC added the abi_tag attribute, which changes the mangled name of functions so that functions for the new and old ABI can be distinguished, even if the change wouldn't otherwise affect the mangled name (e.g. the return type of a function).
This code
#include <locale>
#include <string>
int main() {
std::locale().name();
}
on GCC emits a call to _ZNKSt6locale4nameB5cxx11Ev, which demangles to std::locale::name[abi:cxx11]() const, and returns a SSO string with the new ABI.
Clang, on other other hand, doesn't support the abi_tag attribute, and emits a call to _ZNKSt6locale4nameEv, which demangles to simply std::locale::name() const - which is the version returning a COW string (the old ABI).
The net result is that the program ends up trying to use a COW string as an SSO string when compiled with Clang. Havoc ensues.
The obvious workaround is to force the old ABI via -D_GLIBCXX_USE_CXX11_ABI=0.
I think the "" parameter might be corrupting something. I don't think it's a legal argument?
To verify it's nothing else, try running this:
#include <iostream>
#include <locale>
int main() {
std::locale("").name();
}
It compiles and runs just fine with GCC:
g++ -Wall -pedantic locale.cpp
<= No errorrs, no warnings
./a.out
The locale is 'en_US.UTF-8'
<= Expected output
ADDENDUM:
Exactly the same with MSVS 2013 - no errors or warnings compiling; no errors running:
locale.cpp =>
#include <iostream>
#include <locale>
#include <string>
int main() {
std::cout << "The locale is '" << std::locale("").name() << "'" << std::endl;
}
Output =>
locale
The locale is 'English_United States.1252'

In C++11, how to print a character given its Unicode code?

I am sure this is a dumb question, but is there a straightforward way to print a Unicode character in C++ given its hex code? For instance: I know that the code for ❤ is 0x2764. Is there a way I can use its code to print it (either via printf or on a stream)?
For the record, I can print the character by writing:
cout << "\u2764" << endl;
but that requires knowing the value at compile time rather than using a variable.
Thanks
From a comment I see you're on OS X, which uses UTF-8 and has a sufficiently complete implementation of the C++11 library (libc++) for the following to work.
#include <codecvt> // wstring_convert, codecvt_utf8
#include <iostream> // cout
int main() {
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> convert;
std::cout << convert.to_bytes(static_cast<char32_t>(0x2764)) << '\n';
}
This depends on the console working properly with UTF-8, which OS X's does.
Be sure to output wchar_t to the stream.
#include <iostream>
std::wcout << static_cast<wchar_t>(0x2764) << std::endl;

Apparently missing overload of getline() taking RRef to stream in GCC 4.7.2 and Clang 3.2

I ran into an unexpected compilation error when trying to use getline() with a temporary stream object:
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string input = "hello\nworld\nof\ndelimiters";
string line;
if (getline(stringstream(input), line)) // ERROR!
{
cout << line << endl;
}
}
It looks like no overload of getline() exists that accepts an rvalue reference to a stream object. If I change main() to use an lvalue, it compiles and runs as expected:
int main()
{
string input = "hello\nworld\nof\ndelimiters";
string line;
stringstream ss(inpupt);
if (getline(ss, line)) // OK
{
cout << line << endl;
}
}
So I had a look in the C++11 Standard and I found out (§ 21.4.8.9) that an overload of getline() that takes an rvalue reference to a stream object should be present.
Am I missing something obvious, or is this a bug? The error occurs both with GCC 4.7.2 and with Clang 3.2. I cannot test this on VC at the moment.
If I compile on OS X with the following line, it compiles successfully. What version of the libstdc++ or libc++ are you using?
clang++ -std=c++11 -stdlib=libc++ foo.cc
libstdc++ (and libc++ for that matter) do not yet fully implement the C++ 2011 standard library. This appears to be one of the missing functions from libstdc++.
Sadly, I don't know of any resource that lists exactly what is missing from each implementation.