Unexpected results with wchar_t and c_str string conversion - c++

Based on this answer to a related question, I tried to write a method that converts a standard string to a wide string, which I can then convert into a wchar_t*.
Why aren't the two different ways of creating the wchar_t* equivalent? (I've shown the values that my debugger gives me).
TEST_METHOD(TestingAssertsWithGetWideString)
{
std::wstring wString1 = GetWideString("me");
const wchar_t* wchar1 = wString1.c_str(); // wchar1 = "me"
const wchar_t* wchar2 = GetWideString("me").c_str(); // wchar2 = "ﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮ#" (Why?!)
}
where GetWideString is defined as follows:
inline const std::wstring GetWideString(const std::string &str)
{
std::wstring wstr;
wstr.assign(str.begin(), str.end());
return wstr;
};
Note: the following doesn't work either.
const wchar_t* wchar2 = GetWChar("me");
const wchar_t *GetWChar(const std::string &str)
{
std::wstring wstr;
wstr.assign(str.begin(), str.end());
return wstr.c_str();
};

Each time you call GetWideString(), you are creating a new std::wstring, which has a newly allocated memory buffer. You are comparing pointers to different memory blocks (assuming Assert::AreEqual() is simply comparing the pointers themselves and not the contents of the memory blocks that are being pointed at).
Update: const wchar_t* wchar2 = GetWideString("me").c_str(); does not work because GetWideString() returns a temporary std::wstring that goes out of scope and gets freed as soon as the statement is finished. Thus you are obtaining a pointer to a temporary memory block, and then leaving that pointer dangling when that memory gets freed before you can use the pointer for anything.
Also, const wchar_t* wchar2 = GetWChar("me"); should not compile. GetWChar() returns a std::wstring, which does not implement an implicit conversion to wchar_t*. You have to use the c_str() method to get a wchar_t* from a std::wstring.

Because the two pointers aren't equal. A wchar_t * is not a String, so you get the generic AreEqual.

std::wstring contains of wide characters of type wchar_t. std::string contains characters of type char. For special characters stored within std::string a multi-byte encoding is being used, i.e. some characters are represented by 2 characters within such a string. Converting between these thus can not be easy as calling a simple assign.
To convert between "wide" strings and multi-byte strings, you can use following helpers (Windows only):
// multi byte to wide char:
std::wstring s2ws(const std::string& str)
{
int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
std::wstring wstrTo(size_needed, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
return wstrTo;
}
// wide char to multi byte:
std::string ws2s(const std::wstring& wstr)
{
int size_needed = WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), int(wstr.length() + 1), 0, 0, 0, 0);
std::string strTo(size_needed, 0);
WideCharToMultiByte(CP_ACP, 0, wstr.c_str(), int(wstr.length() + 1), &strTo[0], size_needed, 0, 0);
return strTo;
}

Related

Equivalent wstring operation?

i'm trying to do the same operation, but with a compatible code for visual c++ 2010:
wstring widen(string Str) {
const size_t wcharCount = Str.size() + 1;
vector<wchar_t> Buffer(wcharCount);
return wstring{
Buffer.data(),(size_t)MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), -1, Buffer.data(), (int)wcharCount);
};
}
Specifically this:
return wstring{
Buffer.data(),(size_t)MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), -1, Buffer.data(), (int)wcharCount);
};
How do i return a wstring, but with another way... because i'm getting errors, because the compiler i must use is older.
I also want to learn more about converting the code, compatible with older versions of Microsoft visual c++ (this case here, i want to compile this in 2010 version (and i have it)).
EDIT:
As an useful to track errors:
|Line 5| error C2143: syntax error : missing ';' before '{'|
|Line 5| error C2275: 'std::wstring' : illegal use of this type as an expression|
For older (pre-C++11) compilers, use parenthesis (return wstring(...)) instead of braces (return wstring{...}), and also use std::vector::operator[] to access the allocated array, not std::vector::data() (which didn’t exist yet), eg:
return wstring(
&Buffer[0], MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), -1, &Buffer[0], wcharCount)
);
That being said, the source std::string’s size is the wrong size to use for the std::vector. Call MultiByteToWideChar() twice - call it once with a NULL output buffer to calculate the necessary size, and then call it again to write to the buffer. And, you should be using the std::string’s actual size instead of -1 for the source buffer size.
wstring widen(const string &Str) {
const int wcharCount = MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), Str.size(), NULL, 0);
vector<wchar_t> Buffer(wcharCount);
return wstring(
&Buffer[0], MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), Str.size(), &Buffer[0], wcharCount)
);
}
Note that in C++11 and later, it is safe to use a pre-sized std::wstring directly as the destination buffer, you don’t need the std::vector at all, because std::wstring is guaranteed to use a single contiguous block of memory for its character data. And even in earlier compilers, this is usually safe in practice (though NOT guaranteed) because most implementations use a contiguous block anyway for efficiency:
wstring widen(const string &Str) {
const int wcharCount = MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), Str.size(), NULL, 0);
wstring Buffer;
if (wcharCount > 0) {
Buffer.resize(wcharCount);
MultiByteToWideChar(CP_UTF8, 0, Str.c_str(), Str.size(), &Buffer[0]/* or Buffer.data() in C++17 */, wcharCount);
}
return Buffer;
}

How to convert std::string to wchar_t*

std::regex regexpy("y:(.+?)\"");
std::smatch my;
regex_search(value.text, my, regexpy);
y = my[1];
std::wstring wide_string = std::wstring(y.begin(), y.end());
const wchar_t* p_my_string = wide_string.c_str();
wchar_t* my_string = const_cast<wchar_t*>(p_my_string);
URLDownloadToFile(my_string, aDest);
I'm using Unicode, the encoding of the source string is ASCII, UrlDownloadToFile expands to UrlDownloadToFileW (wchar_t*) the code above compiles in debug mode, but with a lot of warnings like:
warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data
So do I ask, how I could convert a std::string to a wchar_t?
First off, you don't need the const_cast, as URLDownloadToFileW() takes a const wchar_t* as input, so passing it wide_string.c_str() will work as-is:
URLDownloadToFile(..., wide_string.c_str(), ...);
That being said, you are constructing a std::wstring with the individual char values of a std::string as-is. That will work without data loss only for ASCII characters <= 127, which have the same numeric values in both ASCII and Unicode. For non-ASCII characters, you need to actually convert the char data to Unicode, such as with MultiByteToWideChar() (or equivilent), eg:
std::wstring to_wstring(const std::string &s)
{
std::wstring wide_string;
// NOTE: be sure to specify the correct codepage that the
// str::string data is actually encoded in...
int len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.size(), NULL, 0);
if (len > 0) {
wide_string.resize(len);
MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.size(), &wide_string[0], len);
}
return wide_string;
}
URLDownloadToFileW(..., to_wstring(y).c_str(), ...);
That being said, there is a simpler solution. If the std::string is encoded in the user's default locale, you can simply call URLDownloadToFileA() instead, passing it the original std::string as-is, and let the OS handle the conversion for you, eg:
URLDownloadToFileA(..., y.c_str(), ...);
There is a cross-platform solution. You can use std::mbtowc.
std::wstring convert_mb_to_wc(std::string s) {
std::wstring out;
std::mbtowc(nullptr, 0, 0);
int offset;
size_t index = 0;
for (wchar_t wc;
(offset = std::mbtowc(&wc, &s[index], s.size() - index)) > 0;
index += offset) {
out.push_back(wc);
}
return out;
}
Adapted from an example on cppreference.com at https://en.cppreference.com/w/cpp/string/multibyte/mbtowc .

How do I assign a value to TCHAR* without using a string literal with TEXT()?

I need to assign a value to a TCHAR* variable in C++ and I have been told that this is accomplished using the TEXT() macro. However, I have found that I am only able to do this when using string literals.
//This assignment uses a string literal and works
TCHAR* name = TEXT("example");
//This assignment uses a local variable and causes an error
char* example = "example";
TCHAR* otherName = TEXT(example);
This wouldn't be a problem if it wasn't for the fact that the value of the TEXT() quote parameter will be determined by the user at runtime. Therefore, I need to store the value in some kind of local variable and pass it to the TEXT() macro. How am I able to use a local variable with TEXT() instead of a string literal? Or is there another way that I can assign the value to the TCHAR* varible?
The TEXT() macro only works for literals at compile-time. For non-literal data, you have to perform a runtime conversion instead.
If UNICODE is defined for the project, TCHAR will map to wchar_t, and you will have to use MultiByteToWideChar() (or equivalent) to convert your char* value to a wchar_t buffer:
char* example = "example";
int example_len = strlen(example);
int otherName_len = MultiByteToWideChar(CP_ACP, 0, example, example_len, NULL, 0);
TCHAR* otherName = new TCHAR[otherName_len+1];
MultiByteToWideChar(CP_ACP, 0, example, example_len, otherName, otherName_len);
otherName[otherName_len] = 0;
// use otherName as needed ...
delete[] otherName;
If UNICODE is not defined, TCHAR will map to char instead, and you can just assign your char* directly:
char* example = "example";
TCHAR* otherName = example;
I would suggest using C++ strings to help you:
std::basic_string<TCHAR> toTCHAR(const std::string &s)
{
#ifdef UNICODE
std::basic_string<TCHAR> result;
int len = MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.length(), NULL, 0);
if (len > 0)
{
result.resize(len);
MultiByteToWideChar(CP_ACP, 0, s.c_str(), s.length(), &result[0], len);
}
return result;
#else
return s;
#endif
}
char* example = "example";
std::basic_string<TCHAR> otherName = toTCHAR(example);

C++ Unicode Issue

I'm having a bit of trouble with handling unicode conversions.
The following code outputs this into my text file.
HELLO??O
std::string test = "HELLO";
std::string output;
int len = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)test.c_str(), -1, NULL, 0, NULL, NULL);
char *buf = new char[len];
int len2 = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)test.c_str(), -1, buf, len, NULL, NULL);
output = buf;
std::wofstream outfile5("C:\\temp\\log11.txt");
outfile5 << test.c_str();
outfile5 << output.c_str();
outfile5.close();
But as you can see, output is just a unicode conversion from the test variable. How is this possible?
Check if the LEN is correct after first measuring call. In general, you should not cast test.c_str() to LPCWSTR. The 'test' as is 'char'-string not 'wchar_t'-wstring. You may cast it to LPCSTR - note the 'W' missing. The WinAPI has distinction between that. You really should be using wstring if you want to keep widechars in it.. Yeah, after re-reading your code, the test should be a wstring, then you can cast it to LPCWSTR safely.
after reading this
Microsoft wstring reference
I changed
std::string test = "HELLO";
to
std::wstring test = L"HELLO";
And the string was outputted correctly and I got
HELLOHELLO

WINAPI: how to get edit's text to a std::string?

I'm trying the following code:
int length = SendMessage(textBoxPathToSource, WM_GETTEXTLENGTH, 0, 0);
LPCWSTR lpstr;
SendMessage(textBoxPathToSource, WM_GETTEXT, length+1, LPARAM(LPCWSTR));
std::string s(lpstr);
But it doesn't work.
You're using it absolutely incorrectly:
First, you are passing a type instead of a value here:
SendMessage(textBoxPathToSource, WM_GETTEXT, length+1, LPARAM(LPCWSTR));
Interfacing WinAPI functions who write to a string requires a buffer, since std::string's cannot be written to directly.
You need to define a space to hold the value:
WCHAR wszBuff[256] = {0}; (of course you could allocate the storage space using new, which you didn't, you just declared LPCWSTR lpstr).
Extract the string and store in that buffer:
SendMessage(textBoxPathToSource, WM_GETTEXT, 256, (LPARAM)wszBuff);
and perform std::wstring s(lpStr).
EDIT:
Please note the use of std::wstring, not std::string.
What ALevy said is correct, but it'd be better to use a std::vector<WCHAR> than a fixed-size buffer (or using new):
std::wstring s;
int length = SendMessageW(textBoxPathToSource, WM_GETTEXTLENGTH, 0, 0);
if (length > 0)
{
std::vector<WCHAR> buf(length + 1 /* NUL */);
SendMessageW(textBoxPathToSource,
WM_GETTEXT,
buf.size(),
reinterpret_cast<LPCWSTR>(&buf[0]));
s = &buf[0];
}