Is there a function to concatenate two char16_t - c++

There is the function wcsncat_s() for concatenating two wchar_t*:
errno_t wcsncat_s( wchar_t *restrict dest, rsize_t destsz, const wchar_t *restrict src, rsize_t count );
Is there an equivalent function for concatenating two char16_t?

Not really.
On Windows, though, wchar_t is functionally identical to char16_t, so you could just cast your char16_t* to a wchar_t*.
Otherwise you can do it simply enough by writing yourself a function for it.

You could use std::u16string if you want something portable.
std::u16string str1(u16"The quick brown fox ");
std::u16string str2(u16"Jumped over the lazy dog");
std::u16string str3 = str1+str2; // concatenate
const char16_t* psz = str3.c_str();
The validity of psz lasts as long as str3 doesn't go out of out scope.
But the more portable and flexible solution is to just use wchar_t everywhere (which is 32-bit on Mac). Unless you are explicitly using 16-bit char strings (perhaps for a specific UTf16 processing routine), it's easier to just keep your code in the wide char (wchar_t) space. Plays nicer with native APIs and libraries on Mac and Windows.

Related

c++ how to convert wchar_t to char16_t

Convert between string, u16string & u32string
This post explains the opposite of my question. So I need to post a new question
I need to convert wchar_t to char16_t. I found a sample of doing the opposite ( char16_t -> wchar_t) here:
I am not familiar with templates etc, sorry. Can anybody give me an example of converting wchar_t to char16_t please?
I have this piece of code that I want to adapt for converting wchar_t to char16_t.
std::wstring u16fmt(const char16_t* str) {
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert_wstring;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert;
std::string utf8str = convert.to_bytes(str);
std::wstring wstr = convert_wstring.from_bytes(utf8str);
return wstr;
}
Ah, and it should run on Windows and Linux
If sizeof( wchar_t ) == 2 (*), you're straddled with Windows and can only hope your wstring holds UTF-16 (and hasn't been smashed flat to UCS-2 by some old Windows function).
If sizeof( wchar_t ) == 4 (*), you're not on Windows and need to do a UTF-32 to UTF-16 conversion.
(*): Assuming CHAR_BIT == 8.
I am, however, rather pessimistic about standard library's Unicode capabilities beyond simple "piping through", so if you're going to do any actual work on those strings, I'd recommend ICU, the de-facto C/C++ standard library for all things Unicode.
icu::UnicodeString has a wchar_t * constructor, and you can call getTerminatedBuffer() to get a (non-owning) const char16_t *. Or, of course, just use icu::UnicodeString, which uses UTF-16 internally.

how does one convert std::u16string -> std::wstring using <codecvt>?

I found a bunch of questions on a similar topic, but nothing regarding wide to wide conversion with <codecvt>, which is supposed to be the correct choice in the modern code.
The std::codecvt_utf16<wchar_t> seems to be a logical choice to perform the conversion.
However std::wstring_convert seem to expect std::string at one end. The methods from_bytes and to_bytes emphasize this purpose.
I mean, the best solution so far is something like std::copy, which might work for my specific case, but seems kinda low tech and probably not too correct either.
I have a string feeling that I am missing something rather obvious.
Cheers.
The std::wstring_convert and std::codecvt... classes are deprecated in C++17 onward. There is no longer a standard way to convert between the various string classes.
If your compiler still supports the classes, you can certainly use them. However, you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards, eg:
std::u16string utf16 = ...;
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv;
std::string utf8 = utf16conv.to_bytes(utf16);
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> wconv;
std::wstring wstr = wconv.from_bytes(utf8);
Just know that this approach will break when the classes are eventually dropped from the standard library.
Using std::copy() (or simply the various std::wstring data construct/assign methods) will work only on Windows, where wchar_t and char16_t are both 16-bit in size representing UTF-16:
std::u16string utf16 = ...;
std::wstring wstr;
#ifdef _WIN32
wstr.reserve(utf16.size());
std::copy(utf16.begin(), utf16.end(), std::back_inserter(wstr));
/*
or: wstr = std::wstring(utf16.begin(), utf16.end());
or: wstr.assign(utf16.begin(), utf16.end());
or: wstr = std::wstring(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
or: wstr.assign(reinterpret_cast<const wchar_t*>(utf16.c_str()), utf16.size());
*/
#else
// do something else ...
#endif
But, on other platforms, where wchar_t is 32-bit in size representing UTF-32, you will need to actually convert the data, using the code shown above, or a platform-specific API or 3rd party Unicode library that can do the data conversion, such as libiconv, ICU. etc.
you cannot convert directly from std::u16string to std::wstring (and vice versa) with them. You will have to convert to an intermediate UTF-8 std::string first, and then convert that afterwards
This doesn't appear to be the case as
clang: converting const char16_t* (UTF-16) to wstring (UCS-4)
shows:
u16string s = u"hello";
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));

Using iconv while maintaining code correctness

I'm currently using iconv to convert documents with different encodings.
The iconv() function has the following prototype:
size_t iconv (
iconv_t cd,
const char* * inbuf,
size_t * inbytesleft,
char* * outbuf,
size_t * outbytesleft
);
So far, I only had to convert buffers of type char* but I also realized I could have to convert buffers of type wchar_t*. In fact, iconv even has a dedicated encoding name "wchar_t" for such buffers: this encoding adapts to the operating system settings: that is, on my computers, it refers to UCS-2 on Windows and to UTF-32 on Linux.
But here lies the problem: if I have a buffer of wchar_t* I can reinterpret_cast it to a buffer of char* to use it in iconv, but then I face implementation defined behavior: I cannot be sure that the all compilers will behave the same regarding the cast.
What should I do here ?
reinterpret_cast<char const*> is safe and not implementation defined, at least not on any real implementations.
The language explicitly allows any object to be reinterpreted as an array of characters and the way you get that array of characters is using reinterpret_cast.

String comparisons. How can you compare string with std::wstring? WRT strcmp

I am trying to compare two formats that I expected would be somewhat compatible, since they are both generally strings. I have tried to perform strcmp with a string and std::wstring, and as I'm sure C++ gurus know, this will simply not compile. Is it possible to compare these two types? Is there an easy conversion here?
You need to convert your char* string - "multibyte" in ISO C parlance - to a wchar_t* string - "wide character" in ISO C parlance. The standard function that does that is called mbstowcs ("Multi-Byte String To Wide Character String")
NOTE: as Steve pointed out in comments, this is a C99 function and thus is not ISO C++ conformant, but may be supported by C++ implementations as an extension. MSVC and g++ both support it.
It is used thus:
const char* input = ...;
std::size_t output_size = std::mbstowcs(NULL, input, 0); // get length
std::vector<wchar_t> output_buffer(output_size);
// output_size is guaranteed to be >0 because of \0 at end
std::mbstowcs(&output_buffer[0], input, output_size);
std::wstring output(&output_buffer[0]);
Once you have two wstrings, just compare as usual. Note that this will use the current system locale for conversion (i.e. on Windows this will be the current "ANSI" codepage) - normally this is just what you want, but occasionally you'll need to deal with a specific encoding, in which case the above won't do, and you'll need to use something like iconv.
EDIT
All other answers seem to go for direct codepoint translation (i.e. the equivalent of (wchar_t)c for every char c in the string). This may not work for all locales, but it will work if e.g. your char are all ASCII or Latin-1, and your wchar_t are Unicode. If you're sure that's what you really want, the fastest way is actually to avoid conversion altogether, and to use std::lexicographical_compare:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
const char* s_end = s + strlen(s);
bool is_ws_less_than_s = std::lexicographical_compare(ws.begin, ws.end(),
s, s_end());
bool is_s_less_than_ws = std::lexicographical_compare(s, s_end(),
ws.begin(), ws.end());
bool is_s_equal_to_ws = !is_ws_less_than_s && !is_s_less_than_ws;
If you specifically need to test for equality, use std::equal with a length check:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
std::size_t s_len = strlen(s);
bool are_equal =
ws.length() == s_len &&
std::equal(ws.begin(), ws.end(), s);
The quick and dirty way is
if( std::wstring(your_char_ptr_string) == your_wstring)
I say dirty because it will create a temporary string and copy your_char into it. However, it will work just fine as long as you are not in a tight loop.
Note that wstring uses 16 bit characters (i.e unicode - 65536 possible characters) whereas char* tends to be 8 bit characters (Ascii, Latin english only). They are not the same, so wstring-->char* might loose accuracy.
-Tom
First of all you have to ask yourself why you are using std::wstring which is a unicode format with char* (cstring) which is ansi. It is best practice to use unicode because it allows your application to be internationalized, but using a mix doesn't make much sense in most cases. If you want your cstrings to be unicode use wchar_t. If you want your STL strings to be ansi use std::string.
Now back to your question.
The first thing you want to do is convert one of them to match the other datatype.
std::string an std::wstring have the c_str function
here are the function definitions
const char* std::string::c_str() const
const wchar_t* std::wstring::c_str() const
I don't remember off hand how to convert char * to wchar_t * and vice versa, but after you do that you can use strcmp. If you google you'll find a way.
You could use the functions below to convert std::wstring to std::string then c_str will give you char * which you can strcmp
#include <string>
#include <algorithm>
// Prototype for conversion functions
std::wstring StringToWString(const std::string& s);
std::string WStringToString(const std::wstring& s);
std::wstring StringToWString(const std::string& s)
{
std::wstring temp(s.length(),L' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
std::string WStringToString(const std::wstring& s)
{
std::string temp(s.length(), ' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
Convert your wstring to a string.
wstring a = L"foobar";
string b(a.begin(),a.end());
Now you can compare it to any char* using b.c_str() or whatever you like.
char c[] = "foobar";
cout<<strcmp(b.c_str(),c)<<endl;

I want to convert std::string into a const wchar_t *

Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?
First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.
If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.
You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.
If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.
Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();