I have a string in char* format and would like to convert it to wchar_t*, to pass to a Windows function.
Does this little function help?
#include <cstdlib>
int mbstowcs(wchar_t *out, const char *in, size_t size);
Also see the C++ reference
If you don't want to link against the C runtime library, use the MultiByteToWideChar API call, e.g:
const size_t WCHARBUF = 100;
const char szSource[] = "HELLO";
wchar_t wszDest[WCHARBUF];
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, szSource, -1, wszDest, WCHARBUF);
the Windows SDK specifies 2 functions in kernel32.lib for converting strings from and to a wide character set. those are MultiByteToWideChar() and WideCharToMultiByte().
please note that, unlike the function name suggest, the string does not necessarily use a multi-byte character set, but can be a simple ANSI string. alse note that those functions understand UTF-7 and UTF-8 as a multi-byte character set. the wide char character set is always UTF-16.
schnaader's answer use the conversion defined by the current C locale, this one uses the C++ locale interface (who said that it was simple?)
std::wstring widen(std::string const& s, std::locale loc)
{
std::char_traits<wchar_t>::state_type state = { 0 };
typedef std::codecvt<wchar_t, char, std::char_traits<wchar_t>::state_type >
ConverterFacet;
ConverterFacet const& converter(std::use_facet<ConverterFacet>(loc));
char const* nextToRead = s.data();
wchar_t buffer[BUFSIZ];
wchar_t* nextToWrite;
std::codecvt_base::result result;
std::wstring wresult;
while ((result
= converter.in
(state,
nextToRead, s.data()+s.size(), nextToRead,
buffer, buffer+sizeof(buffer)/sizeof(*buffer), nextToWrite))
== std::codecvt_base::partial)
{
wresult.append(buffer, nextToWrite);
}
if (result == std::codecvt_base::error) {
throw std::runtime_error("Encoding error");
}
wresult.append(buffer, nextToWrite);
return wresult;
}
Related
i've tried a few things and haven't yet been able to figure out how to get const wchar_t *text (shown bellow) to pass into the variable StoreText (shown below). What am i doing wrong?
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
char* StoreText = text; //This is where error occurs
}
You cannot directly assign a wchar_t* to a char*, as they are different and incompatible data types.
If StoreText needs to point at the same memory address that text is pointing at, such as if you are planning on looping through the individual bytes of the text data, then a simple type-cast will suffice:
char* StoreText = (char*)text;
However, if StoreText is expected to point to its own separate copy of the character data, then you would need to convert the wide character data into narrow character data instead. Such as by:
using the WideCharToMultiByte() function on Windows:
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
int StoreTextLen = 1 + WideCharToMultiByte(CP_ACP, 0, text, len, NULL, 0, NULL, NULL);
std::vector<char> StoreTextBuffer(StoreTextLen);
WideCharToMultiByte(CP_ACP, 0, text, len, &StoreTextBuffer[0], StoreTextLen, NULL, NULL);
char* StoreText = &StoreText[0];
//...
}
using the std::wcsrtombs() function:
#include <cwchar>
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
std::mbstate_t state = std::mbstate_t();
int StoreTextLen = 1 + std::wcsrtombs(NULL, &text, 0, &state);
std::vector<char> StoreTextBuffer(StoreTextLen);
std::wcsrtombs(&StoreTextBuffer[0], &text, StoreTextLen, &state);
char *StoreText = &StoreTextBuffer[0];
//...
}
using the std::wstring_convert class (C++11 and later):
#include <locale>
void KeyboardComplete(int localClientNum, const wchar_t *text, unsigned int len)
{
std::wstring_convert<std::codecvt<wchar_t, char, std::mbstate_t>> conv;
std::string StoreTextBuffer = conv.to_bytes(text, text+len);
char *StoreText = &StoreTextBuffer[0];
//...
}
using similar conversions from the ICONV or ICU library.
First of all, for strings you should use std::wstring/std::string instead of raw pointers.
The C++11 Locale (http://en.cppreference.com/w/cpp/locale) library can be used to convert wide string to narrow string.
I wrote a wrapper function below and have used it for years. Hope it will be helpful to you, too.
#include <string>
#include <locale>
#include <codecvt>
std::string WstringToString(const std::wstring & wstr, const std::locale & loc /*= std::locale()*/)
{
std::string buf(wstr.size(), 0);
std::use_facet<std::ctype<wchar_t>>(loc).narrow(wstr.c_str(), wstr.c_str() + wstr.size(), '?', &buf[0]);
return buf;
}
wchar_t is a wide character. It is typically 16 or 32 bits per character, but this is system dependent.
char is a good ol' CHAR_BIT-sized data type. Again, how big it is is system dependent. Most likely it's going to be one byte, but I can't think of a reason why CHAR_BIT can't be 16 or 32 bits, making it the same size as wchar_t.
If they are different sizes, a direct assignment is doomed. For example an 8 bit char will see 2 characters, and quite likely 2 completely unrelated characters, for every 1 character in a 16 bit wchar_t. This would be bad.
Second, even if they are the same size, they may have different encodings. For example, the numeric value assigned to the letter 'A' may be different for the char and the wchar_t. It could be 65 in char and 16640 in wchar_t.
To make any sense in the different data type char and wchar_t will need to be translated to the other's encoding. std::wstring_convert will often perform this translation for you, but look into the locale library for more complicated translations. Both require a compiler supporting C++11 or better. In previous C++ Standards, a small army of functions provided conversion support. Third party libraries such as Boost::locale are helpful to unify and provide wider support.
Conversion functions are supplied by the operating system to translate between the encoding used by the OS and other common encodings.
You have to do a cast, you can do this:
char* StoreText = (char*)text;
I think this may work.
But you can use the wcstombs function of cstdlib library.
char someText[12];
wcstombs(StoreText,text, 12);
Last parameter most be a number of byte available in the array pointed.
How can i convert a narrow string to a wide string ?
I have tried this method :
string myName;
getline( cin , myName );
wstring printerName( L(myName) ); // error C3861: 'L': identifier not found
wchar_t* WprinterName = printerName.c_str(); // error C2440: 'initializing' : cannot convert from 'const wchar_t *' to 'wchar_t *'
But i get errors as listed above.
Why do i get these errors ? How can i fix them ?
Is there any other method of directly converting a narrow string to a wide string ?
If the source is ASCII encoded, you can just do this:
wstring printerName;
printerName.assign( myName.begin(), myName.end() );
You should do this :
inline std::wstring convert( const std::string& as )
{
// deal with trivial case of empty string
if( as.empty() ) return std::wstring();
// determine required length of new string
size_t reqLength = ::MultiByteToWideChar( CP_UTF8, 0, as.c_str(), (int)as.length(), 0, 0 );
// construct new string of required length
std::wstring ret( reqLength, L'\0' );
// convert old string to new string
::MultiByteToWideChar( CP_UTF8, 0, as.c_str(), (int)as.length(), &ret[0], (int)ret.length() );
// return new string ( compiler should optimize this away )
return ret;
}
This expects the std::string to be UTF-8 (CP_UTF8), when you have another encoding replace the codepage.
Another way could be :
inline std::wstring convert( const std::string& as )
{
wchar_t* buf = new wchar_t[as.size() * 2 + 2];
swprintf( buf, L"%S", as.c_str() );
std::wstring rval = buf;
delete[] buf;
return rval;
}
I found this while googling the problem. I have pasted the code for reference. Author of this post is Paul McKenzie.
std::string str = "Hello";
std::wstring str2(str.length(), L' '); // Make room for characters
// Copy string to wstring.
std::copy(str.begin(), str.end(), str2.begin());
ATL (non-express editions of Visual Studio) has a couple useful class types which can convert the strings plainly. You can use the constructor directly, if you do not need to hold onto the string.
#include <atlbase.h>
std::wstring wideString(L"My wide string");
std::string narrowString("My not-so-wide string");
ATL::CW2A narrow(wideString.c_str()); // narrow is a narrow string
ATL::CA2W wide(asciiString.c_str()); // wide is a wide string
Here are two functions that can be used: mbstowcs_s and wcstombs_s.
mbstowcs_s: Converts a sequence of multibyte characters to a corresponding sequence of wide characters.
wcstombs_s: Converts a sequence of wide characters to a corresponding sequence of multibyte characters.
errno_t wcstombs_s(
size_t *pReturnValue,
char *mbstr,
size_t sizeInBytes,
const wchar_t *wcstr,
size_t count
);
errno_t mbstowcs_s(
size_t *pReturnValue,
wchar_t *wcstr,
size_t sizeInWords,
const char *mbstr,
size_t count
);
See http://msdn.microsoft.com/en-us/library/eyktyxsx.aspx and http://msdn.microsoft.com/en-us/library/s7wzt4be.aspx.
The Windows API provides routines for doing this: WideCharToMultiByte() and MultiByteToWideChar(). However, they are a pain to use. Each conversion requires two calls to the routines and you have to look after allocating/freeing memory and making sure the strings are correctly terminated. You need a wrapper!
I have a convenient C++ wrapper on my blog, here, which you are welcome to use.
The original question of this thread was: "How can i convert a narrow string to a wide string?"
However, from the example code given in the question, there seems to be no conversion necessary. Rather, there is a compiler error due to the newer compilers deprecating something that used to be okay. Here is what I think is going on:
// wchar_t* wstr = L"A wide string"; // Error: cannot convert from 'const wchar_t *' to 'wchar_t *'
wchar_t const* wstr = L"A wide string"; // okay
const wchar_t* wstr_equivalent = L"A wide string"; // also okay
The c_str() seems to be treated the same as a literal, and is considered a constant (const). You could use a cast. But preferable is to add const.
The best answer I have seen for converting between wide and narrow strings is to use std::wstringstream. And this is one of the answers given to C++ Convert string (or char*) to wstring (or wchar_t*)
You can convert most anything to and from strings and wide strings using stringstream and wstringstream.
This article published on the MSDN Magazine 2016 September issue discusses the conversion in details using Win32 APIs.
Note that using MultiByteToWideChar() is much faster than using the std:: stuff on Windows.
Use mbtowc():
string myName;
wchar_t wstr[BUFFER_SIZE];
getline( cin , myName );
mbtowc(wstr, myName, BUFFER_SIZE);
I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?
I tried this:
std::string test = args.OptionArg();
but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
You can convert a wide char string to an ASCII string using the following function:
#include <locale>
#include <sstream>
#include <string>
std::string ToNarrow( const wchar_t *s, char dfault = '?',
const std::locale& loc = std::locale() )
{
std::ostringstream stm;
while( *s != L'\0' ) {
stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
}
return stm.str();
}
Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU.
This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really
typedef std::basic_string<char> string
So we could define our own typedef, say
#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}
Then you could use magic::string with TCHAR, LPCTSTR, and so forth
It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.
Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.
#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>
std::string convert(const std::wstring& wstr)
{
const int BUFF_SIZE = 7;
if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
std::string result;
bool shifts = std::wctomb(nullptr, 0); // reset the conversion state
for (const wchar_t wc : wstr)
{
std::array<char, BUFF_SIZE> buffer;
const int ret = std::wctomb(buffer.data(), wc);
if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
buffer[ret] = '\0'; // make 'buffer' contain a C-style string
result = result + std::string(buffer.data());
}
return result;
}
int main()
{
auto loc = std::setlocale(LC_ALL, "en_US.utf8"); // UTF-8
if (loc == nullptr) throw std::logic_error("failed to set locale");
std::wstring wstr = L"aąß水𝄋-扫描-€𐍈\u00df\u6c34\U0001d10b";
std::cout << convert(wstr) << "\n";
}
This prints, as expected:
Explanation
7 seems to be the minimal secure value of the buffer size, BUFF_SIZE. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\0'.
MB_CUR_MAX is a run-time variable, so static_assert is not usable here
Each wide character is translated into its char representation using std::wctomb
This conversion makes sense only if the current locale allows multi-byte representations of a character
For this to work, the application needs to set the proper locale. en_US.utf8 seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a command.
Critique of the most upvoted answer
The most upvoted answer,
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size(). Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).
You could just use wstring and keep everything in Unicode
just for fun :-):
const wchar_t* val = L"hello mfc";
std::string test((LPCTSTR)CString(val));
Following code is more concise:
wchar_t wstr[500];
char string[500];
sprintf(string,"%ls",wstr);
I have a string in char* format and would like to convert it to wchar_t*, to pass to a Windows function.
Does this little function help?
#include <cstdlib>
int mbstowcs(wchar_t *out, const char *in, size_t size);
Also see the C++ reference
If you don't want to link against the C runtime library, use the MultiByteToWideChar API call, e.g:
const size_t WCHARBUF = 100;
const char szSource[] = "HELLO";
wchar_t wszDest[WCHARBUF];
MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, szSource, -1, wszDest, WCHARBUF);
the Windows SDK specifies 2 functions in kernel32.lib for converting strings from and to a wide character set. those are MultiByteToWideChar() and WideCharToMultiByte().
please note that, unlike the function name suggest, the string does not necessarily use a multi-byte character set, but can be a simple ANSI string. alse note that those functions understand UTF-7 and UTF-8 as a multi-byte character set. the wide char character set is always UTF-16.
schnaader's answer use the conversion defined by the current C locale, this one uses the C++ locale interface (who said that it was simple?)
std::wstring widen(std::string const& s, std::locale loc)
{
std::char_traits<wchar_t>::state_type state = { 0 };
typedef std::codecvt<wchar_t, char, std::char_traits<wchar_t>::state_type >
ConverterFacet;
ConverterFacet const& converter(std::use_facet<ConverterFacet>(loc));
char const* nextToRead = s.data();
wchar_t buffer[BUFSIZ];
wchar_t* nextToWrite;
std::codecvt_base::result result;
std::wstring wresult;
while ((result
= converter.in
(state,
nextToRead, s.data()+s.size(), nextToRead,
buffer, buffer+sizeof(buffer)/sizeof(*buffer), nextToWrite))
== std::codecvt_base::partial)
{
wresult.append(buffer, nextToWrite);
}
if (result == std::codecvt_base::error) {
throw std::runtime_error("Encoding error");
}
wresult.append(buffer, nextToWrite);
return wresult;
}
I am trying to compare two formats that I expected would be somewhat compatible, since they are both generally strings. I have tried to perform strcmp with a string and std::wstring, and as I'm sure C++ gurus know, this will simply not compile. Is it possible to compare these two types? Is there an easy conversion here?
You need to convert your char* string - "multibyte" in ISO C parlance - to a wchar_t* string - "wide character" in ISO C parlance. The standard function that does that is called mbstowcs ("Multi-Byte String To Wide Character String")
NOTE: as Steve pointed out in comments, this is a C99 function and thus is not ISO C++ conformant, but may be supported by C++ implementations as an extension. MSVC and g++ both support it.
It is used thus:
const char* input = ...;
std::size_t output_size = std::mbstowcs(NULL, input, 0); // get length
std::vector<wchar_t> output_buffer(output_size);
// output_size is guaranteed to be >0 because of \0 at end
std::mbstowcs(&output_buffer[0], input, output_size);
std::wstring output(&output_buffer[0]);
Once you have two wstrings, just compare as usual. Note that this will use the current system locale for conversion (i.e. on Windows this will be the current "ANSI" codepage) - normally this is just what you want, but occasionally you'll need to deal with a specific encoding, in which case the above won't do, and you'll need to use something like iconv.
EDIT
All other answers seem to go for direct codepoint translation (i.e. the equivalent of (wchar_t)c for every char c in the string). This may not work for all locales, but it will work if e.g. your char are all ASCII or Latin-1, and your wchar_t are Unicode. If you're sure that's what you really want, the fastest way is actually to avoid conversion altogether, and to use std::lexicographical_compare:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
const char* s_end = s + strlen(s);
bool is_ws_less_than_s = std::lexicographical_compare(ws.begin, ws.end(),
s, s_end());
bool is_s_less_than_ws = std::lexicographical_compare(s, s_end(),
ws.begin(), ws.end());
bool is_s_equal_to_ws = !is_ws_less_than_s && !is_s_less_than_ws;
If you specifically need to test for equality, use std::equal with a length check:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
std::size_t s_len = strlen(s);
bool are_equal =
ws.length() == s_len &&
std::equal(ws.begin(), ws.end(), s);
The quick and dirty way is
if( std::wstring(your_char_ptr_string) == your_wstring)
I say dirty because it will create a temporary string and copy your_char into it. However, it will work just fine as long as you are not in a tight loop.
Note that wstring uses 16 bit characters (i.e unicode - 65536 possible characters) whereas char* tends to be 8 bit characters (Ascii, Latin english only). They are not the same, so wstring-->char* might loose accuracy.
-Tom
First of all you have to ask yourself why you are using std::wstring which is a unicode format with char* (cstring) which is ansi. It is best practice to use unicode because it allows your application to be internationalized, but using a mix doesn't make much sense in most cases. If you want your cstrings to be unicode use wchar_t. If you want your STL strings to be ansi use std::string.
Now back to your question.
The first thing you want to do is convert one of them to match the other datatype.
std::string an std::wstring have the c_str function
here are the function definitions
const char* std::string::c_str() const
const wchar_t* std::wstring::c_str() const
I don't remember off hand how to convert char * to wchar_t * and vice versa, but after you do that you can use strcmp. If you google you'll find a way.
You could use the functions below to convert std::wstring to std::string then c_str will give you char * which you can strcmp
#include <string>
#include <algorithm>
// Prototype for conversion functions
std::wstring StringToWString(const std::string& s);
std::string WStringToString(const std::wstring& s);
std::wstring StringToWString(const std::string& s)
{
std::wstring temp(s.length(),L' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
std::string WStringToString(const std::wstring& s)
{
std::string temp(s.length(), ' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
Convert your wstring to a string.
wstring a = L"foobar";
string b(a.begin(),a.end());
Now you can compare it to any char* using b.c_str() or whatever you like.
char c[] = "foobar";
cout<<strcmp(b.c_str(),c)<<endl;