String comparisons. How can you compare string with std::wstring? WRT strcmp

String comparisons. How can you compare string with std::wstring? WRT strcmp - c++

I am trying to compare two formats that I expected would be somewhat compatible, since they are both generally strings. I have tried to perform strcmp with a string and std::wstring, and as I'm sure C++ gurus know, this will simply not compile. Is it possible to compare these two types? Is there an easy conversion here?

You need to convert your char* string - "multibyte" in ISO C parlance - to a wchar_t* string - "wide character" in ISO C parlance. The standard function that does that is called mbstowcs ("Multi-Byte String To Wide Character String")
NOTE: as Steve pointed out in comments, this is a C99 function and thus is not ISO C++ conformant, but may be supported by C++ implementations as an extension. MSVC and g++ both support it.
It is used thus:
const char* input = ...;
std::size_t output_size = std::mbstowcs(NULL, input, 0); // get length
std::vector<wchar_t> output_buffer(output_size);
// output_size is guaranteed to be >0 because of \0 at end
std::mbstowcs(&output_buffer[0], input, output_size);
std::wstring output(&output_buffer[0]);
Once you have two wstrings, just compare as usual. Note that this will use the current system locale for conversion (i.e. on Windows this will be the current "ANSI" codepage) - normally this is just what you want, but occasionally you'll need to deal with a specific encoding, in which case the above won't do, and you'll need to use something like iconv.
EDIT
All other answers seem to go for direct codepoint translation (i.e. the equivalent of (wchar_t)c for every char c in the string). This may not work for all locales, but it will work if e.g. your char are all ASCII or Latin-1, and your wchar_t are Unicode. If you're sure that's what you really want, the fastest way is actually to avoid conversion altogether, and to use std::lexicographical_compare:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
const char* s_end = s + strlen(s);
bool is_ws_less_than_s = std::lexicographical_compare(ws.begin, ws.end(),
s, s_end());
bool is_s_less_than_ws = std::lexicographical_compare(s, s_end(),
ws.begin(), ws.end());
bool is_s_equal_to_ws = !is_ws_less_than_s && !is_s_less_than_ws;
If you specifically need to test for equality, use std::equal with a length check:
#include <algorithm>
const char* s = ...;
std::wstring ws = ...;
std::size_t s_len = strlen(s);
bool are_equal =
ws.length() == s_len &&
std::equal(ws.begin(), ws.end(), s);

The quick and dirty way is
if( std::wstring(your_char_ptr_string) == your_wstring)
I say dirty because it will create a temporary string and copy your_char into it. However, it will work just fine as long as you are not in a tight loop.
Note that wstring uses 16 bit characters (i.e unicode - 65536 possible characters) whereas char* tends to be 8 bit characters (Ascii, Latin english only). They are not the same, so wstring-->char* might loose accuracy.
-Tom

First of all you have to ask yourself why you are using std::wstring which is a unicode format with char* (cstring) which is ansi. It is best practice to use unicode because it allows your application to be internationalized, but using a mix doesn't make much sense in most cases. If you want your cstrings to be unicode use wchar_t. If you want your STL strings to be ansi use std::string.
Now back to your question.
The first thing you want to do is convert one of them to match the other datatype.
std::string an std::wstring have the c_str function
here are the function definitions
const char* std::string::c_str() const
const wchar_t* std::wstring::c_str() const
I don't remember off hand how to convert char * to wchar_t * and vice versa, but after you do that you can use strcmp. If you google you'll find a way.
You could use the functions below to convert std::wstring to std::string then c_str will give you char * which you can strcmp
#include <string>
#include <algorithm>
// Prototype for conversion functions
std::wstring StringToWString(const std::string& s);
std::string WStringToString(const std::wstring& s);
std::wstring StringToWString(const std::string& s)
{
std::wstring temp(s.length(),L' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}
std::string WStringToString(const std::wstring& s)
{
std::string temp(s.length(), ' ');
std::copy(s.begin(), s.end(), temp.begin());
return temp;
}

Convert your wstring to a string.
wstring a = L"foobar";
string b(a.begin(),a.end());
Now you can compare it to any char* using b.c_str() or whatever you like.
char c[] = "foobar";
cout<<strcmp(b.c_str(),c)<<endl;

Related

Converting string to wchar_t (wide character) C++ [duplicate]

Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?

First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.

If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.

You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.

If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.

Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();

converting narrow string to wide string

How can i convert a narrow string to a wide string ?
I have tried this method :
string myName;
getline( cin , myName );
wstring printerName( L(myName) ); // error C3861: 'L': identifier not found
wchar_t* WprinterName = printerName.c_str(); // error C2440: 'initializing' : cannot convert from 'const wchar_t *' to 'wchar_t *'
But i get errors as listed above.
Why do i get these errors ? How can i fix them ?
Is there any other method of directly converting a narrow string to a wide string ?

If the source is ASCII encoded, you can just do this:
wstring printerName;
printerName.assign( myName.begin(), myName.end() );

You should do this :
inline std::wstring convert( const std::string& as )
{
// deal with trivial case of empty string
if( as.empty() ) return std::wstring();
// determine required length of new string
size_t reqLength = ::MultiByteToWideChar( CP_UTF8, 0, as.c_str(), (int)as.length(), 0, 0 );
// construct new string of required length
std::wstring ret( reqLength, L'\0' );
// convert old string to new string
::MultiByteToWideChar( CP_UTF8, 0, as.c_str(), (int)as.length(), &ret[0], (int)ret.length() );
// return new string ( compiler should optimize this away )
return ret;
}
This expects the std::string to be UTF-8 (CP_UTF8), when you have another encoding replace the codepage.
Another way could be :
inline std::wstring convert( const std::string& as )
{
wchar_t* buf = new wchar_t[as.size() * 2 + 2];
swprintf( buf, L"%S", as.c_str() );
std::wstring rval = buf;
delete[] buf;
return rval;
}

I found this while googling the problem. I have pasted the code for reference. Author of this post is Paul McKenzie.
std::string str = "Hello";
std::wstring str2(str.length(), L' '); // Make room for characters
// Copy string to wstring.
std::copy(str.begin(), str.end(), str2.begin());

ATL (non-express editions of Visual Studio) has a couple useful class types which can convert the strings plainly. You can use the constructor directly, if you do not need to hold onto the string.
#include <atlbase.h>
std::wstring wideString(L"My wide string");
std::string narrowString("My not-so-wide string");
ATL::CW2A narrow(wideString.c_str()); // narrow is a narrow string
ATL::CA2W wide(asciiString.c_str()); // wide is a wide string

Here are two functions that can be used: mbstowcs_s and wcstombs_s.
mbstowcs_s: Converts a sequence of multibyte characters to a corresponding sequence of wide characters.
wcstombs_s: Converts a sequence of wide characters to a corresponding sequence of multibyte characters.
errno_t wcstombs_s(
size_t *pReturnValue,
char *mbstr,
size_t sizeInBytes,
const wchar_t *wcstr,
size_t count
);
errno_t mbstowcs_s(
size_t *pReturnValue,
wchar_t *wcstr,
size_t sizeInWords,
const char *mbstr,
size_t count
);
See http://msdn.microsoft.com/en-us/library/eyktyxsx.aspx and http://msdn.microsoft.com/en-us/library/s7wzt4be.aspx.

The Windows API provides routines for doing this: WideCharToMultiByte() and MultiByteToWideChar(). However, they are a pain to use. Each conversion requires two calls to the routines and you have to look after allocating/freeing memory and making sure the strings are correctly terminated. You need a wrapper!
I have a convenient C++ wrapper on my blog, here, which you are welcome to use.

The original question of this thread was: "How can i convert a narrow string to a wide string?"
However, from the example code given in the question, there seems to be no conversion necessary. Rather, there is a compiler error due to the newer compilers deprecating something that used to be okay. Here is what I think is going on:
// wchar_t* wstr = L"A wide string"; // Error: cannot convert from 'const wchar_t *' to 'wchar_t *'
wchar_t const* wstr = L"A wide string"; // okay
const wchar_t* wstr_equivalent = L"A wide string"; // also okay
The c_str() seems to be treated the same as a literal, and is considered a constant (const). You could use a cast. But preferable is to add const.
The best answer I have seen for converting between wide and narrow strings is to use std::wstringstream. And this is one of the answers given to C++ Convert string (or char*) to wstring (or wchar_t*)
You can convert most anything to and from strings and wide strings using stringstream and wstringstream.

This article published on the MSDN Magazine 2016 September issue discusses the conversion in details using Win32 APIs.
Note that using MultiByteToWideChar() is much faster than using the std:: stuff on Windows.

Use mbtowc():
string myName;
wchar_t wstr[BUFFER_SIZE];
getline( cin , myName );
mbtowc(wstr, myName, BUFFER_SIZE);

How do I convert wchar_t* to std::string?

I changed my class to use std::string (based on the answer I got here but a function I have returns wchar_t *. How do I convert it to std::string?
I tried this:
std::string test = args.OptionArg();
but it says error C2440: 'initializing' : cannot convert from 'wchar_t *' to 'std::basic_string<_Elem,_Traits,_Ax>'

std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );

You can convert a wide char string to an ASCII string using the following function:
#include <locale>
#include <sstream>
#include <string>
std::string ToNarrow( const wchar_t *s, char dfault = '?',
const std::locale& loc = std::locale() )
{
std::ostringstream stm;
while( *s != L'\0' ) {
stm << std::use_facet< std::ctype<wchar_t> >( loc ).narrow( *s++, dfault );
}
return stm.str();
}
Be aware that this will just replace any wide character for which an equivalent ASCII character doesn't exist with the dfault parameter; it doesn't convert from UTF-16 to UTF-8. If you want to convert to UTF-8 use a library such as ICU.

This is an old question, but if it's the case you're not really seeking conversions but rather using the TCHAR stuff from Mircosoft to be able to build both ASCII and Unicode, you could recall that std::string is really
typedef std::basic_string<char> string
So we could define our own typedef, say
#include <string>
namespace magic {
typedef std::basic_string<TCHAR> string;
}
Then you could use magic::string with TCHAR, LPCTSTR, and so forth

It's rather disappointing that none of the answers given to this old question addresses the problem of converting wide strings into UTF-8 strings, which is important in non-English environments.
Here's an example code that works and may be used as a hint to construct custom converters. It is based on an example code from Example code in cppreference.com.
#include <iostream>
#include <clocale>
#include <string>
#include <cstdlib>
#include <array>
std::string convert(const std::wstring& wstr)
{
const int BUFF_SIZE = 7;
if (MB_CUR_MAX >= BUFF_SIZE) throw std::invalid_argument("BUFF_SIZE too small");
std::string result;
bool shifts = std::wctomb(nullptr, 0); // reset the conversion state
for (const wchar_t wc : wstr)
{
std::array<char, BUFF_SIZE> buffer;
const int ret = std::wctomb(buffer.data(), wc);
if (ret < 0) throw std::invalid_argument("inconvertible wide characters in the current locale");
buffer[ret] = '\0'; // make 'buffer' contain a C-style string
result = result + std::string(buffer.data());
}
return result;
}
int main()
{
auto loc = std::setlocale(LC_ALL, "en_US.utf8"); // UTF-8
if (loc == nullptr) throw std::logic_error("failed to set locale");
std::wstring wstr = L"aąß水𝄋-扫描-€𐍈\u00df\u6c34\U0001d10b";
std::cout << convert(wstr) << "\n";
}
This prints, as expected:
Explanation
7 seems to be the minimal secure value of the buffer size, BUFF_SIZE. This includes 4 as the maximum number of UTF-8 bytes encoding a single character; 2 for the possible "shift sequence", 1 for the trailing '\0'.
MB_CUR_MAX is a run-time variable, so static_assert is not usable here
Each wide character is translated into its char representation using std::wctomb
This conversion makes sense only if the current locale allows multi-byte representations of a character
For this to work, the application needs to set the proper locale. en_US.utf8 seems to be sufficiently universal (available on most machines). In Linux, available locales can be queried in the console via locale -a command.
Critique of the most upvoted answer
The most upvoted answer,
std::wstring ws( args.OptionArg() );
std::string test( ws.begin(), ws.end() );
works well only when the wide characters represent ASCII characters - but these are not what wide characters were designed for. In this solution, the converted string contains one char per each source wide char, ws.size() == test.size(). Thus, it loses information from the original wstring and produces strings that cannot be interpreted as proper UTF-8 sequences. For example, on my machine the string resulting from this simplistic conversion of "ĄŚĆII" prints as "ZII", even though its size is 5 (and should be 8).

You could just use wstring and keep everything in Unicode

just for fun :-):
const wchar_t* val = L"hello mfc";
std::string test((LPCTSTR)CString(val));

Following code is more concise:
wchar_t wstr[500];
char string[500];
sprintf(string,"%ls",wstr);

Assigning a "const char*" to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

I assumed that std::wstring and std::string both provide more or less the same interface.
So I tried to enable unicode capabilities for our application
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
# else
typedef std::string AppStringType;
# endif
However that gives me a lot of compile errors when -DAPP_USE_UNICODE is used.
It turned out, that the compiler chokes when a const char[] is assigned to std::wstring.
EDIT: improved example by removing the usage of literal "hello".
#include <string>
void myfunc(const char h[]) {
string s = h; // compiles OK
wstring w = h; // compile Error
}
Why does it make such a difference?
Assigning a const char* to std::string is allowed, but assigning to std::wstring gives compile errors.
Shouldn't std::wstring provide the same interface as std::string? At least for such a basic operation as assignment?
(environment: gcc-4.4.1 on Ubuntu Karmic 32bit)

You should do:
#include <string>
int main() {
const wchar_t h[] = L"hello";
std::wstring w = h;
return 0;
}
std::string is a typedef of std::basic_string<char>, while std::wstring is a typedef of std::basic_string<wchar_t>. As such, the 'equivalent' C-string of a wstring is an array of wchar_ts.
The 'L' in front of the string literal is to indicate that you are using a wide-char string constant.

The relevant part of the string API is this constructor:
basic_string(const charT*);
For std::string, charT is char. For std::wstring it's wchar_t. So the reason it doesn't compile is that wstring doesn't have a char* constructor. Why doesn't wstring have a char* constructor?
There is no one unique way to convert a string of char to a string of wchar. What's the encoding used with the char string? Is it just 7 bit ASCII? Is it UTF-8? Is it UTF-7? Is it SHIFT-JIS? So I don't think it would entirely make sense for std::wstring to have an automatic conversion from char*, even though you could cover most cases. You can use:
w = std::wstring(h, h + sizeof(h) - 1);
which will convert each char in turn to wchar (except the NUL terminator), and in this example that's probably what you want. As int3 says though, if that's what you mean it's most likely better to use a wide string literal in the first place.

To convert from a multibyte encoding to a wide character encoding, take a look at the header <locale> and the type std::codecvt. The Dinkumware library has a class Dinkum::wstring_convert that makes performing such multibyte-to-wide conversions easier.
The function std::codecvt_byname allows one to find a codecvt instance for a particular named encoding. Unfortunately, discovering the names of the encodings (or locales) on your system is implementation-specific.

Small suggestion... Do not use "Unicode" strings under Linux (a.k.a. wide strings). std::string is perfectly fine and holds Unicode very well (UTF-8).
Most Linux API works with char * strings and most popular encoding is UTF-8.
So... Just don't bother yourself using wstring.

In addition to the other answers, you could use a trick from Microsoft's book (specifically, tchar.h), and write something like this:
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
#define _T(s) (L##s)
# else
typedef std::string AppStringType;
#define _T(s) (s)
# endif
AppStringType foo = _T("hello world!");
(Note: my macro-fu is weak, and this is untested, but you get the idea.)

Looks like you can do something like this:
#include <sstream>
// ...
std::wstringstream tmp;
tmp << "hello world";
std::wstring our_string =
Although for a more complex situation, you may want to break down and use mbstowcs

you should use
#include <tchar.h>
tstring instead of wstring/string
TCHAR* instead of char*
and _T("hello") instead of "hello" or L"hello"
this will use the appropriate form of string+char, when _UNICODE is defined.

I want to convert std::string into a const wchar_t *

Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?

First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.

If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.

You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.

If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.

Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String comparisons. How can you compare string with std::wstring? WRT strcmp - c++

Convert your wstring to a string. wstring a = L"foobar"; string b(a.begin(),a.end()); Now you can compare it to any char* using b.c_str() or whatever you like. char c[] = "foobar"; cout<<strcmp(b.c_str(),c)<<endl;

Related

Converting string to wchar_t (wide character) C++ [duplicate]

converting narrow string to wide string

How do I convert wchar_t* to std::string?

Assigning a "const char*" to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

I want to convert std::string into a const wchar_t *

Categories

Resources