How do I convert a char string to a wchar_t string? - c++

I have a string in char* format and would like to convert it to wchar_t*, to pass to a Windows function.

Does this little function help?
#include <cstdlib>
int mbstowcs(wchar_t *out, const char *in, size_t size);
Also see the C++ reference

If you don't want to link against the C runtime library, use the MultiByteToWideChar API call, e.g:
const size_t WCHARBUF = 100;
const char szSource[] = "HELLO";
wchar_t wszDest[WCHARBUF];
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, szSource, -1, wszDest, WCHARBUF);

the Windows SDK specifies 2 functions in kernel32.lib for converting strings from and to a wide character set. those are MultiByteToWideChar() and WideCharToMultiByte().
please note that, unlike the function name suggest, the string does not necessarily use a multi-byte character set, but can be a simple ANSI string. alse note that those functions understand UTF-7 and UTF-8 as a multi-byte character set. the wide char character set is always UTF-16.

schnaader's answer use the conversion defined by the current C locale, this one uses the C++ locale interface (who said that it was simple?)
std::wstring widen(std::string const& s, std::locale loc)
{
std::char_traits<wchar_t>::state_type state = { 0 };
typedef std::codecvt<wchar_t, char, std::char_traits<wchar_t>::state_type >
ConverterFacet;
ConverterFacet const& converter(std::use_facet<ConverterFacet>(loc));
char const* nextToRead = s.data();
wchar_t buffer[BUFSIZ];
wchar_t* nextToWrite;
std::codecvt_base::result result;
std::wstring wresult;
while ((result
= converter.in
(state,
nextToRead, s.data()+s.size(), nextToRead,
buffer, buffer+sizeof(buffer)/sizeof(*buffer), nextToWrite))
== std::codecvt_base::partial)
{
wresult.append(buffer, nextToWrite);
}
if (result == std::codecvt_base::error) {
throw std::runtime_error("Encoding error");
}
wresult.append(buffer, nextToWrite);
return wresult;
}

Related

C++: Convert arg[0] to a wchar_t [duplicate]

I have a string in char* format and would like to convert it to wchar_t*, to pass to a Windows function.
Does this little function help?
#include <cstdlib>
int mbstowcs(wchar_t *out, const char *in, size_t size);
Also see the C++ reference
If you don't want to link against the C runtime library, use the MultiByteToWideChar API call, e.g:
const size_t WCHARBUF = 100;
const char szSource[] = "HELLO";
wchar_t wszDest[WCHARBUF];
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, szSource, -1, wszDest, WCHARBUF);
the Windows SDK specifies 2 functions in kernel32.lib for converting strings from and to a wide character set. those are MultiByteToWideChar() and WideCharToMultiByte().
please note that, unlike the function name suggest, the string does not necessarily use a multi-byte character set, but can be a simple ANSI string. alse note that those functions understand UTF-7 and UTF-8 as a multi-byte character set. the wide char character set is always UTF-16.
schnaader's answer use the conversion defined by the current C locale, this one uses the C++ locale interface (who said that it was simple?)
std::wstring widen(std::string const& s, std::locale loc)
{
std::char_traits<wchar_t>::state_type state = { 0 };
typedef std::codecvt<wchar_t, char, std::char_traits<wchar_t>::state_type >
ConverterFacet;
ConverterFacet const& converter(std::use_facet<ConverterFacet>(loc));
char const* nextToRead = s.data();
wchar_t buffer[BUFSIZ];
wchar_t* nextToWrite;
std::codecvt_base::result result;
std::wstring wresult;
while ((result
= converter.in
(state,
nextToRead, s.data()+s.size(), nextToRead,
buffer, buffer+sizeof(buffer)/sizeof(*buffer), nextToWrite))
== std::codecvt_base::partial)
{
wresult.append(buffer, nextToWrite);
}
if (result == std::codecvt_base::error) {
throw std::runtime_error("Encoding error");
}
wresult.append(buffer, nextToWrite);
return wresult;
}

Take wchar_t and put into char?

i've tried a few things and haven't yet been able to figure out how to get const wchar_t *text (shown bellow) to pass into the variable StoreText (shown below). What am i doing wrong?
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
char* StoreText = text; //This is where error occurs
}
You cannot directly assign a wchar_t* to a char*, as they are different and incompatible data types.
If StoreText needs to point at the same memory address that text is pointing at, such as if you are planning on looping through the individual bytes of the text data, then a simple type-cast will suffice:
char* StoreText = (char*)text;
However, if StoreText is expected to point to its own separate copy of the character data, then you would need to convert the wide character data into narrow character data instead. Such as by:
using the WideCharToMultiByte() function on Windows:
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
int StoreTextLen = 1 + WideCharToMultiByte(CP_ACP, 0, text, len, NULL, 0, NULL, NULL);
std::vector<char> StoreTextBuffer(StoreTextLen);
WideCharToMultiByte(CP_ACP, 0, text, len, &StoreTextBuffer[0], StoreTextLen, NULL, NULL);
char* StoreText = &StoreText[0];
//...
}
using the std::wcsrtombs() function:
#include <cwchar>
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
std::mbstate_t state = std::mbstate_t();
int StoreTextLen = 1 + std::wcsrtombs(NULL, &text, 0, &state);
std::vector<char> StoreTextBuffer(StoreTextLen);
std::wcsrtombs(&StoreTextBuffer[0], &text, StoreTextLen, &state);
char *StoreText = &StoreTextBuffer[0];
//...
}
using the std::wstring_convert class (C++11 and later):
#include <locale>
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
std::wstring_convert<std::codecvt<wchar_t, char, std::mbstate_t>> conv;
std::string StoreTextBuffer = conv.to_bytes(text, text+len);
char *StoreText = &StoreTextBuffer[0];
//...
}
using similar conversions from the ICONV or ICU library.
First of all, for strings you should use std::wstring/std::string instead of raw pointers.
The C++11 Locale (http://en.cppreference.com/w/cpp/locale) library can be used to convert wide string to narrow string.
I wrote a wrapper function below and have used it for years. Hope it will be helpful to you, too.
#include <string>
#include <locale>
#include <codecvt>
std::string WstringToString(const std::wstring & wstr, const std::locale & loc /*= std::locale()*/)
{
std::string buf(wstr.size(), 0);
std::use_facet<std::ctype<wchar_t>>(loc).narrow(wstr.c_str(), wstr.c_str() + wstr.size(), '?', &buf[0]);
return buf;
}
wchar_t is a wide character. It is typically 16 or 32 bits per character, but this is system dependent.
char is a good ol' CHAR_BIT-sized data type. Again, how big it is is system dependent. Most likely it's going to be one byte, but I can't think of a reason why CHAR_BIT can't be 16 or 32 bits, making it the same size as wchar_t.
If they are different sizes, a direct assignment is doomed. For example an 8 bit char will see 2 characters, and quite likely 2 completely unrelated characters, for every 1 character in a 16 bit wchar_t. This would be bad.
Second, even if they are the same size, they may have different encodings. For example, the numeric value assigned to the letter 'A' may be different for the char and the wchar_t. It could be 65 in char and 16640 in wchar_t.
To make any sense in the different data type char and wchar_t will need to be translated to the other's encoding. std::wstring_convert will often perform this translation for you, but look into the locale library for more complicated translations. Both require a compiler supporting C++11 or better. In previous C++ Standards, a small army of functions provided conversion support. Third party libraries such as Boost::locale are helpful to unify and provide wider support.
Conversion functions are supplied by the operating system to translate between the encoding used by the OS and other common encodings.
You have to do a cast, you can do this:
char* StoreText = (char*)text;
I think this may work.
But you can use the wcstombs function of cstdlib library.
char someText[12];
wcstombs(StoreText,text, 12);
Last parameter most be a number of byte available in the array pointed.

convert std::wstring to const *char in c++

How can I convert std::wstring to const *char in C++?
You can convert a std::wstring to a const wchar_t * using the c_str member function :
std::wstring wStr;
const wchar_t *str = wStr.c_str();
However, a conversion to a const char * isn't natural : it requires an additional call to std::wcstombs, like for example:
#include <cstdlib>
// ...
std::wstring wStr;
const wchar_t *input = wStr.c_str();
// Count required buffer size (plus one for null-terminator).
size_t size = (wcslen(input) + 1) * sizeof(wchar_t);
char *buffer = new char[size];
#ifdef __STDC_LIB_EXT1__
// wcstombs_s is only guaranteed to be available if __STDC_LIB_EXT1__ is defined
size_t convertedSize;
std::wcstombs_s(&convertedSize, buffer, size, input, size);
#else
std::wcstombs(buffer, input, size);
#endif
/* Use the string stored in "buffer" variable */
// Free allocated memory:
delete buffer;
You cannot do this just like that. std::wstring represents a string of wide (Unicode) characters, while char* in this case is a string of ASCII characters. There has to be a code page conversion from Unicode to ASCII.
To make the conversion you can use standard library functions such as wcstombs, or Windows' WideCharToMultiByte function.
Updated to incorporate information from comments, thanks for pointing that out.

How do I convert wchar_t* to std::string?

I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?
I tried this:
std::string test = args.OptionArg();
but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
You can convert a wide char string to an ASCII string using the following function:
#include <locale>
#include <sstream>
#include <string>
std::string ToNarrow( const wchar_t *s, char dfault = '?',
const std::locale& loc = std::locale() )
{
std::ostringstream stm;
while( *s != L'\0' ) {
stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
}
return stm.str();
}
Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU.
This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really
typedef std::basic_string<char> string
So we could define our own typedef, say
#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}
Then you could use magic::string with TCHAR, LPCTSTR, and so forth
It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.
Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.
#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>
std::string convert(const std::wstring& wstr)
{
const int BUFF_SIZE = 7;
if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
std::string result;
bool shifts = std::wctomb(nullptr, 0); // reset the conversion state
for (const wchar_t wc : wstr)
{
std::array<char, BUFF_SIZE> buffer;
const int ret = std::wctomb(buffer.data(), wc);
if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
buffer[ret] = '\0'; // make 'buffer' contain a C-style string
result = result + std::string(buffer.data());
}
return result;
}
int main()
{
auto loc = std::setlocale(LC_ALL, "en_US.utf8"); // UTF-8
if (loc == nullptr) throw std::logic_error("failed to set locale");
std::wstring wstr = L"aąß水𝄋-扫描-€𐍈\u00df\u6c34\U0001d10b";
std::cout << convert(wstr) << "\n";
}
This prints, as expected:
Explanation
7 seems to be the minimal secure value of the buffer size, BUFF_SIZE. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\0'.
MB_CUR_MAX is a run-time variable, so static_assert is not usable here
Each wide character is translated into its char representation using std::wctomb
This conversion makes sense only if the current locale allows multi-byte representations of a character
For this to work, the application needs to set the proper locale. en_US.utf8 seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a command.
Critique of the most upvoted answer
The most upvoted answer,
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size(). Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).
You could just use wstring and keep everything in Unicode
just for fun :-):
const wchar_t* val = L"hello mfc";
std::string test((LPCTSTR)CString(val));
Following code is more concise:
wchar_t wstr[500];
char string[500];
sprintf(string,"%ls",wstr);

String comparisons. How can you compare string with std::wstring? WRT strcmp

I am trying to compare two formats that I expected would be somewhat compatible, since they are both generally strings. I have tried to perform strcmp with a string and std::wstring, and as I'm sure C++ gurus know, this will simply not compile. Is it possible to compare these two types? Is there an easy conversion here?
You need to convert your char* string - "multibyte" in ISO C parlance - to a wchar_t* string - "wide character" in ISO C parlance. The standard function that does that is called mbstowcs ("Multi-Byte String To Wide Character String")
NOTE: as Steve pointed out in comments, this is a C99 function and thus is not ISO C++ conformant, but may be supported by C++ implementations as an extension. MSVC and g++ both support it.
It is used thus:
const char* input = ...;
std::size_t output_size = std::mbstowcs(NULL, input, 0); // get length
std::vector<wchar_t> output_buffer(output_size);
// output_size is guaranteed to be >0 because of \0 at end
std::mbstowcs(&output_buffer[0], input, output_size);
std::wstring output(&output_buffer[0]);
Once you have two wstrings, just compare as usual. Note that this will use the current system locale for conversion (i.e. on Windows this will be the current "ANSI" codepage) - normally this is just what you want, but occasionally you'll need to deal with a specific encoding, in which case the above won't do, and you'll need to use something like iconv.
EDIT
All other answers seem to go for direct codepoint translation (i.e. the equivalent of (wchar_t)c for every char c in the string). This may not work for all locales, but it will work if e.g. your char are all ASCII or Latin-1, and your wchar_t are Unicode. If you're sure that's what you really want, the fastest way is actually to avoid conversion altogether, and to use std::lexicographical_compare:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
const char* s_end = s + strlen(s);
bool is_ws_less_than_s = std::lexicographical_compare(ws.begin, ws.end(),
s, s_end());
bool is_s_less_than_ws = std::lexicographical_compare(s, s_end(),
ws.begin(), ws.end());
bool is_s_equal_to_ws = !is_ws_less_than_s && !is_s_less_than_ws;
If you specifically need to test for equality, use std::equal with a length check:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
std::size_t s_len = strlen(s);
bool are_equal =
ws.length() == s_len &&
std::equal(ws.begin(), ws.end(), s);
The quick and dirty way is
if( std::wstring(your_char_ptr_string) == your_wstring)
I say dirty because it will create a temporary string and copy your_char into it. However, it will work just fine as long as you are not in a tight loop.
Note that wstring uses 16 bit characters (i.e unicode - 65536 possible characters) whereas char* tends to be 8 bit characters (Ascii, Latin english only). They are not the same, so wstring-->char* might loose accuracy.
-Tom
First of all you have to ask yourself why you are using std::wstring which is a unicode format with char* (cstring) which is ansi. It is best practice to use unicode because it allows your application to be internationalized, but using a mix doesn't make much sense in most cases. If you want your cstrings to be unicode use wchar_t. If you want your STL strings to be ansi use std::string.
Now back to your question.
The first thing you want to do is convert one of them to match the other datatype.
std::string an std::wstring have the c_str function
here are the function definitions
const char* std::string::c_str() const
const wchar_t* std::wstring::c_str() const
I don't remember off hand how to convert char * to wchar_t * and vice versa, but after you do that you can use strcmp. If you google you'll find a way.
You could use the functions below to convert std::wstring to std::string then c_str will give you char * which you can strcmp
#include <string>
#include <algorithm>
// Prototype for conversion functions
std::wstring StringToWString(const std::string& s);
std::string WStringToString(const std::wstring& s);
std::wstring StringToWString(const std::string& s)
{
std::wstring temp(s.length(),L' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
std::string WStringToString(const std::wstring& s)
{
std::string temp(s.length(), ' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
Convert your wstring to a string.
wstring a = L"foobar";
string b(a.begin(),a.end());
Now you can compare it to any char* using b.c_str() or whatever you like.
char c[] = "foobar";
cout<<strcmp(b.c_str(),c)<<endl;