I have programmed some code but there is some problem. In this codes i am trying to convert string to wstring. But this string have "█" characters. This character have 219 ascii code.
This conversion getting error.
In my code:
string strs= "█and█something else";
wstring wstr(strs.begin(),strs.end());
After debugging, I am getting result like this
?and?something else
How do I correct this problem?
Thanks...
The C-library solution for converting between the system's narrow and wide encoding use the mbsrtowcs and wcsrtombs functions from the <cwchar> header. I've spelt this out in this answer.
In C++11, you can use the wstring_convert template instantiated with a suitable codecvt facet. Unfortunately this requires some custom rigging, which is spelt out on the cppreference page.
I've adapted it here into a self-contained example which converts a wstring to a string, converting from the system's wide into the system's narrow encoding:
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
// utility wrapper to adapt locale-bound facets for wstring/wbuffer convert
template <typename Facet>
struct deletable_facet : Facet
{
using Facet::Facet;
};
int main()
{
std::wstring_convert<
deletable_facet<std::codecvt<wchar_t, char, std::mbstate_t>>> conv;
std::wstring ws(L"Hello world.");
std::string ns = conv.to_bytes(ws);
std::cout << ns << std::endl;
}
Related
I want to create file having arabic name in C++. Below is the program that I tried.
#include <iostream>
#include <fstream>
#include <string>
int main() {
const char *path="D:\\user\\c++\\files\\فثسف.txt";
std::ofstream file(path); //open in constructor
std::string data("Hello World");
file << data;
return 0;
}
But file gets created with junk characters: ÙثسÙ.txt.
I am using windows platform and g++ compiler.
The default encoding for string literals can be specified with the -fexec-charset compiler option for gcc / g++.
In C++11 and later, can also use the u8, u, and U prefixes to strings to specify UTF8, UTF16, and UTF32 encodings:
const char * utf8literal = u8"This is an unicode UTF8 string! 剝Ц";
const char16_t * utf16literal = u"This is an unicode UTF16 string! 剝Ц";
const char32_t * utf32literal = U"This is an unicode UTF32 string! 剝Ц";
Using the above prefixes can upset some functions who aren't expecting these specific types of strings though; in general it may be better to set the compiler option.
There's a great writeup about this topic on this blog post: http://cppwhispers.blogspot.com/2012/11/unicode-and-your-application-3-of-n.html
I hope this helps. :)
Use UTF8:
#include <iostream>
#include <fstream>
#include <filesystem>
int main()
{
namespace fs = std::filesystem;
fs::path path { u8"فثسف.txt" };
std::ofstream file { path };
file << "Hello World";
return 0;
}
Using <filesystem> library may require additional compiler/linker options. GNU implementation requires linking with -lstdc++fs and LLVM implementation requires linking with -lc++fs
I am iterating through all the files in a folder and just want their names in a string. I want to get a string from a std::filesystem::path. How do I do that?
My code:
#include <string>
#include <iostream>
#include <filesystem>
namespace fs = std::experimental::filesystem;
int main()
{
std::string path = "C:/Users/user1/Desktop";
for (auto & p : fs::directory_iterator(path))
std::string fileName = p.path;
}
However I get the following error:
non-standard syntax; use '&' to create a pointer to a member.
To convert a std::filesystem::path to a natively-encoded string (whose type is std::filesystem::path::value_type), use the string() method. Note the other *string() methods, which enable you to obtain strings of a specific encoding (e.g. u8string() for an UTF-8 string).
C++17 example:
#include <filesystem>
#include <string>
namespace fs = std::filesystem;
int main()
{
fs::path path{fs::u8path(u8"愛.txt")};
std::string path_string{path.u8string()};
}
C++20 example (better language and library UTF-8 support):
#include <filesystem>
#include <string>
namespace fs = std::filesystem;
int main()
{
fs::path path{u8"愛.txt"};
std::u8string path_string{path.u8string()};
}
The examples given in the accepted answer, using the UTF-8 operations, are fine and a good guideline. There is just one error in the introductory explanation given in the answer, which Windows/MSVC developers should be aware of:
The string() method does not return the natively-encoded string (which would be std::wstring() on Windows), but rather it always returns a std::string. It also tries to convert the path to the local encoding, which is not always possible, if the path contains a unicode character not representable in the current code page and then the method throws an exception!
If you actually want the behavior that is described in the answer (method returns native string, i.e., std::string on Linux and std::wstring on Windows), you would have to use the native() method or the implicit conversion based on std::filesystem::path::operator string_type(), but as #tambre correctly pointed out in the examples, you should consider using the UTF-8 versions throughout.
In C++ 17 and above, you can use .generic_string() to convert the path to a string: https://en.cppreference.com/w/cpp/filesystem/path/generic_string.
The following is an example that gets the current working directory and converts it into a string.
#include <string>
#include <filesystem>
using std::filesystem::current_path;
int main()
{
filesystem::path directoryPath = current_path();
string stringpath = directoryPath.generic_string();
}
I have a string or a char[], but the it is encoded in utf-16, like this:
Now I want to convert it to utf-8 in a new string, Please help me! I already tried like this:
But the compiler tells me I have a problem. How to solve this problem?
The problem is evident: you define u16_str as a std::string when cvt.to_bytes() expect a std::u16string (as the name of the variable suggest).
The following code works for me
#include <locale>
#include <codecvt>
#include <iostream>
int main ()
{
std::u16string u16_str { u"aeiuoàèìòùAEIOU" };
std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> cvt;
std::string u8_str = cvt.to_bytes(u16_str);
std::cout << u8_str << std::endl;
return 0;
}
#include <iostream>
using namespace std;
int main() {
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
Whys this doesn't work? And how solve this?
It doesn't work because in the default C locale, there is no character which corresponds to U+00A2.
If you're using a standard ubuntu install, it is most likely that your user locale uses a larger character set than US-ASCII, quite possibly Unicode encoded with UTF-8. So you just need to switch to the locale specified in the environment, as follows:
#include <iostream>
/* locale is needed for std::setlocale */
#include <locale>
#include <string>
int main() {
/* The following switches to the locale specified
* by the LC_ALL environment variable.
*/
std::setlocale (LC_ALL, "");
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
If you use std::string instead of std::wstring and std::cout instead of std::wcout, then you don't need the setlocale because no translation is needed (provided the console expects UTF-8).
I'm trying to convert UTF-16 encoded strings to UCS-4
If I understand correctly, C++11 provides this conversion through codecvt_utf16.
My code is something like:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));
wcout << ws << endl;
return 0;
}
Note: the explicit push_backs to get around the fact that my version of clang (Xcode 4.2) doesn't have unicode string literals.
When the code is run, I get terminate exception. Am I doing something illegal here? I was thinking it should work because the const char* that I passed to wstring_convert is UTF-16 encoded, right? I have also considered endianness being the issue, but I have checked that it's not the case.
Two errors:
1) from_bytes() overload that takes the single const char* expects a null-terminated byte string, but your very second byte is '\0'.
2) your system is likely little-endian, so you need to convert from UTF-16LE to UCS-4:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));
wcout << ws << endl;
return 0;
}
Tested with Visual Studio 2010 SP1 on Windows and CLang++/libc++-svn on Linux.