Convert between string, u16string & u32string
This post explains the opposite of my question. So I need to post a new question
I need to convert wchar_t to char16_t. I found a sample of doing the opposite ( char16_t -> wchar_t) here:
I am not familiar with templates etc, sorry. Can anybody give me an example of converting wchar_t to char16_t please?
I have this piece of code that I want to adapt for converting wchar_t to char16_t.
std::wstring u16fmt(const char16_t* str) {
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert_wstring;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string utf8str = convert.to_bytes(str);
std::wstring wstr = convert_wstring.from_bytes(utf8str);
return wstr;
}
Ah, and it should run on Windows and Linux
If sizeof( wchar_t ) == 2 (*), you're straddled with Windows and can only hope your wstring holds UTF-16 (and hasn't been smashed flat to UCS-2 by some old Windows function).
If sizeof( wchar_t ) == 4 (*), you're not on Windows and need to do a UTF-32 to UTF-16 conversion.
(*): Assuming CHAR_BIT == 8.
I am, however, rather pessimistic about standard library's Unicode capabilities beyond simple "piping through", so if you're going to do any actual work on those strings, I'd recommend ICU, the de-facto C/C++ standard library for all things Unicode.
icu::UnicodeString has a wchar_t * constructor, and you can call getTerminatedBuffer() to get a (non-owning) const char16_t *. Or, of course, just use icu::UnicodeString, which uses UTF-16 internally.
Related
There is the function wcsncat_s() for concatenating two wchar_t*:
errno_t wcsncat_s( wchar_t *restrict dest, rsize_t destsz, const wchar_t *restrict src, rsize_t count );
Is there an equivalent function for concatenating two char16_t?
Not really.
On Windows, though, wchar_t is functionally identical to char16_t, so you could just cast your char16_t* to a wchar_t*.
Otherwise you can do it simply enough by writing yourself a function for it.
You could use std::u16string if you want something portable.
std::u16string str1(u16"The quick brown fox ");
std::u16string str2(u16"Jumped over the lazy dog");
std::u16string str3 = str1+str2; // concatenate
const char16_t* psz = str3.c_str();
The validity of psz lasts as long as str3 doesn't go out of out scope.
But the more portable and flexible solution is to just use wchar_t everywhere (which is 32-bit on Mac). Unless you are explicitly using 16-bit char strings (perhaps for a specific UTf16 processing routine), it's easier to just keep your code in the wide char (wchar_t) space. Plays nicer with native APIs and libraries on Mac and Windows.
I found a bunch of questions on a similar topic, but nothing regarding wide to wide conversion with <codecvt>, which is supposed to be the correct choice in the modern code.
The std::codecvt_utf16<wchar_t> seems to be a logical choice to perform the conversion.
However std::wstring_convert seem to expect std::string at one end. The methods from_bytes and to_bytes emphasize this purpose.
I mean, the best solution so far is something like std::copy, which might work for my specific case, but seems kinda low tech and probably not too correct either.
I have a string feeling that I am missing something rather obvious.
Cheers.
The std::wstring_convert and std::codecvt... classes are deprecated in C++17 onward. There is no longer a standard way to convert between the various string classes.
If your compiler still supports the classes, you can certainly use them. However, you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards, eg:
std::u16string utf16 = ...;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv;
std::string utf8 = utf16conv.to_bytes(utf16);
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> wconv;
std::wstring wstr = wconv.from_bytes(utf8);
Just know that this approach will break when the classes are eventually dropped from the standard library.
Using std::copy() (or simply the various std::wstring data construct/assign methods) will work only on Windows, where wchar_t and char16_t are both 16-bit in size representing UTF-16:
std::u16string utf16 = ...;
std::wstring wstr;
#ifdef _WIN32
wstr.reserve(utf16.size());
std::copy(utf16.begin(), utf16.end(), std::back_inserter(wstr));
/*
or: wstr = std::wstring(utf16.begin(), utf16.end());
or: wstr.assign(utf16.begin(), utf16.end());
or: wstr = std::wstring(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
or: wstr.assign(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
*/
#else
// do something else ...
#endif
But, on other platforms, where wchar_t is 32-bit in size representing UTF-32, you will need to actually convert the data, using the code shown above, or a platform-specific API or 3rd party Unicode library that can do the data conversion, such as libiconv, ICU. etc.
you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards
This doesn't appear to be the case as
clang: converting const char16_t* (UTF-16) to wstring (UCS-4)
shows:
u16string s = u"hello";
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));
Is there an icu function to create a std::wstring from an icu UnicodeString ? I have been searching the ICU manual but haven't been able to find one.
(I know i can convert UnicodeString to UTF8 and then convert to platform dependent wchar_t* but i am looking for one function in UnicodeString which can do this conversion.
The C++ standard doesn't dictate any specific encoding for std::wstring. On Windows systems, wchar_t is 16-bit, and on Linux, macOS, and several other platforms, wchar_t is 32-bit. As far as C++'s std::wstring is concerned, it is just an arbitrary sequence of wchar_t in much the same way that std::string is just an arbitrary sequence of char.
It seems that icu::UnicodeString has no in-built way of creating a std::wstring, but if you really want to create a std::wstring anyway, you can use the C-based API u_strToWCS() like this:
icu::UnicodeString ustr = /* get from somewhere */;
std::wstring wstr;
int32_t requiredSize;
UErrorCode error = U_ZERO_ERROR;
// obtain the size of string we need
u_strToWCS(nullptr, 0, &requiredSize, ustr.getBuffer(), ustr.length(), &error);
// resize accordingly (this will not include any terminating null character, but it also doesn't need to either)
wstr.resize(requiredSize);
// copy the UnicodeString buffer to the std::wstring.
u_strToWCS(wstr.data(), wstr.size(), nullptr, ustr.getBuffer(), ustr.length(), &error);
Supposedly, u_strToWCS() will use the most efficient method for converting from UChar to wchar_t (if they are the same size, then it is just a straightfoward copy I suppose).
Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?
First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.
If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.
You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.
If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.
Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();
Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?
First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.
If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.
You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.
If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.
Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();