Dynamic conversion from multibyte to wide string (char* to Platform::String) - c++

Is it safe to assume that multibyte characters will always require the same or less characters than a multibyte string?
Is it ok to do something like:
char* source = "Hello World!"; //Some dynamic string...
int length = strlen(source);
wchar_t* temp = new wchar_t* ... //Allocate mem for this based on length+1
Kerrek SB(temp, source, length+1);
Platform::String result(temp);
Or should I use mbstowcs to find the size?
char* source = "Hello World!"; //Some dynamic string...
size_t bufferSize;
mbstowcs_s(&bufferSize, nullptr, 0, source, 0); // Get the required buffer size.
wchar_t* buffer = new wchar_t* ... //Allocate mem for this based on bufferSize
mbstowcs_s(&bufferSize, buffer, bufferSize, source, bufferSize); // Copy the string.
Platform::String result(buffer);

Related

Why does RAD Studio CreateBlobStream with CryptUnprotectData return extra characters?

I'm writing a recovery app that pulls passwords from Chrome. It has a GUI, so I've used their SQLite wrapper, which uses both SQLConnection and SQLQuery. Here's a snip of my code:
//Create our blob stream
TStream *Stream2 = SQLQuery1->CreateBlobStream(SQLQuery1->FieldByName("password_value"), bmRead);
//Get our blob size
int size = Stream2->Size;
//Create our buffer
char* pbDataInput = new char[size+1];
//Adding null terminator to buffer
memset(pbDataInput, 0x00, sizeof(char)*(size+1));
//Write to our buffer
Stream2->ReadBuffer(pbDataInput, size);
DWORD cbDataInput = size;
DataOut.pbData = pbDataInput;
DataOut.cbData = cbDataInput;
LPWSTR pDescrOut = NULL;
//Decrypt password
CryptUnprotectData( &DataOut,
&pDescrOut,
NULL,
NULL,
NULL,
0,
&DataVerify);
//Output password
UnicodeString password = (UnicodeString)(char*)DataVerify.pbData;
passwordgrid->Cells[2][i] = password;
The output data looks fine, except it behaves as if something went wrong with my null terminator. Here's what output looks like on every line:
I've Read
Windows doc for CryptUnprotectData:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa382377.aspx
Embarcadero documentation for CreateBlobStream:
http://docwiki.embarcadero.com/Libraries/en/Data.DB.TDataSet.CreateBlobStream
memset:
http://www.cplusplus.com/reference/cstring/memset/
Your reading and decrypting calls operate on raw bytes only, they know nothing about strings, and don't care about them. The null-terminator you are adding to pbDataInput is never used, so get rid of it:
//Get our blob size
int size = Stream2->Size;
//Create our buffer
char* pbDataInput = new char[size];
//Write to our buffer
Stream2->ReadBuffer(pbDataInput, size);
DWORD cbDataInput = size;
...
delete[] pbDataInput;
delete Stream2;
Now, when assigning pbData to password, you are casting pbData to char*, so the UnicodeString constructor interprets the data as a null-terminated ANSI string and will convert it to UTF-16 using the system default ANSI codepage, which is potentially a lossy conversion for non-ASCII characters. Is that what you really want?
If so, and if the decrypted data is not actually null-terminated, you have to specify the number of characters to the UnicodeString constructor:
UnicodeString password( (char*)DataVerify.pbData, DataVerify.cbData );
On the other hand, if the decrypted output is already in UTF-16, you need to cast pbData to wchar_t* instead:
UnicodeString password = (wchar_t*)DataVerify.pbData;
Or, if not null-terminated:
UnicodeString password( (wchar_t*)DataVerify.pbData, DataVerify.cbData / sizeof(wchar_t) );

memcpy CString to char*

I'm trying to copy a CString to a char* using memcpy() and I have difficulties doing it. In fact, only the first character is copied. Here is my code:
CString str = _T("something");
char* buff = new char();
memcpy(buff, str, str.GetLength() + 1);
After this, all that buff contains is the letter s.
You probably are mixing ASCII and Unicode strings. If compiling with Unicode setting, then CString stores a Unicode string (two bytes per character, in your case each second byte is 0 and thus looks like an ASCII string terminator).
If you want all ASCII:
CStringA str = "something";
char* buff = new char[str.GetLength()+1];
memcpy(buff, (LPCSTR)str, str.GetLength() + 1);
If you want all Unicode:
CStringW str = L"something";
wchar_t* buff = new wchar_t[str.GetLength()+1];
memcpy(buff, (LPCWSTR)str, sizeof(wchar_t)*(str.GetLength() + 1));
If you want it working on both settings:
CString str = _T("something");
TCHAR* buff = new TCHAR[str.GetLength()+1];
memcpy(buff, (LPCTSTR)str, sizeof(TCHAR) * (str.GetLength() + 1));
If you want to convert a Unicode string to an ASCII string:
CString str = _T("something");
char* buff = new char[str.GetLength()+1];
memcpy(buff, (LPCSTR)CT2A(str), str.GetLength() + 1);
Please also recognize the casts from str to LPCSTR, LPCWSTR or LPCTSTR and the corrected buffer allocation (need multiple characters and not only one).
Also, I am not quite sure if this is really what you need. A strdup for example looks much simpler than a new + memcpy.
You have only allocated memory to hold a char variable. To do what you intend, you need to allocate enough memory to hold the complete string.
CString str = _T("something");
LPTSTR buff = new TCHAR[(str.GetLength()+1) * sizeof(TCHAR)]; //allocate sufficient memory
memcpy(buff, str, str.GetLength() + 1);
You are
Only allocating one char, which won't be enough unless the CString is empty, and
copying the CString instance instead of the string it represents.
Try
CString str = _T("something");
int size = str.GetLength() + 1;
char* buff = new char[size];
memcpy(buff, str.GetBuffer(), size);

One file lib to conv utf8 (char*) to wchar_t?

I am using libjson which is awesome. The only problem I have is I need to convert an utf8 string (char*) to a wide char string (wchar_t*). I googled and tried 3 different libs and they ALL failed (due to missing headers).
I don't need anything fancy. Just a one way conversion. How do I do this?
If you're on windows (which, chances are you are, given your need for wchar_t), use MultiByteToWideChar function (declared in windows.h), as so:
int length = MultiByteToWideChar(CP_UTF8, 0, src, src_length, 0, 0);
wchar_t *output_buffer = new wchar_t [length];
MultiByteToWideChar(CP_UTF8, 0, src, src_length, output_buffer, length);
Alternatively, if all you're looking for is a literal multibyte representation of your UTF8 (which is improbable, but possible), use the following (stdlib.h):
wchar_t * output_buffer = new wchar_t [1024];
int length = mbstowcs(output_buffer, src, 1024);
if(length > 1024){
delete[] output_buffer;
output_buffer = new wchar_t[length+1];
mbstowcs(output_buffer, src, length);
}
Hope this helps.
the below successfully enables CreateDirectoryW() to write to C:\Users\ПетрКарасев , basically an easier-to-understand wrapper around the MultiByteTyoWideChar mentioned by someone earlier.
std::wstring utf16_from_utf8(const std::string & utf8)
{
// Special case of empty input string
if (utf8.empty())
return std::wstring();
// Шаг 1, Get length (in wchar_t's) of resulting UTF-16 string
const int utf16_length = ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
NULL, // unused - no conversion done in this step
0 // request size of destination buffer, in wchar_t's
);
if (utf16_length == 0)
{
// Error
DWORD error = ::GetLastError();
throw ;
}
// // Шаг 2, Allocate properly sized destination buffer for UTF-16 string
std::wstring utf16;
utf16.resize(utf16_length);
// // Шаг 3, Do the actual conversion from UTF-8 to UTF-16
if ( ! ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
0, // default flags
utf8.data(), // source UTF-8 string
utf8.length(), // length (in chars) of source UTF-8 string
&utf16[0], // destination buffer
utf16.length() // size of destination buffer, in wchar_t's
) )
{
// не работает сука ...
DWORD error = ::GetLastError();
throw;
}
return utf16; // ура!
}
Here is a piece of code i wrote. It seems to work well enough. It returns 0 on utf8 error or when the value is > FFFF (which cant be held by a wchar_t)
#include <string>
using namespace std;
wchar_t* utf8_to_wchar(const char*utf8){
wstring sz;
wchar_t c;
auto p=utf8;
while(*p!=0){
auto v=(*p);
if(v>=0){
c = v;
sz+=c;
++p;
continue;
}
int shiftCount=0;
if((v&0xE0) == 0xC0){
shiftCount=1;
c = v&0x1F;
}
else if((v&0xF0) == 0xE0){
shiftCount=2;
c = v&0xF;
}
else
return 0;
++p;
while(shiftCount){
v = *p;
++p;
if((v&0xC0) != 0x80) return 0;
c<<=6;
c |= (v&0x3F);
--shiftCount;
}
sz+=c;
}
return (wchar_t*)sz.c_str();
}
The following (untested) code shows how to convert a multibyte string in your current locale into a wide string. So if your current locale is UTF-8, then this will suit your needs.
const char * inputStr = ... // your UTF-8 input
size_t maxSize = strlen(inputStr) + 1;
wchar_t * outputWStr = new wchar_t[maxSize];
size_t result = mbstowcs(outputWStr, inputStr, maxSize);
if (result == -1) {
cerr << "Invalid multibyte characters in input";
}
You can use setlocale() to set your locale.

Why is the following C++ code printing only the first character?

I am trying to convert a char string to a wchar string.
In more detail: I am trying to convert a char[] to a wchar[] first and then append " 1" to that string and the print it.
char src[256] = "c:\\user";
wchar_t temp_src[256];
mbtowc(temp_src, src, 256);
wchar_t path[256];
StringCbPrintf(path, 256, _T("%s 1"), temp_src);
wcout << path;
But it prints just c
Is this the right way to convert from char to wchar? I have come to know of another way since. But I'd like to know why the above code works the way it does?
mbtowc converts only a single character. Did you mean to use mbstowcs?
Typically you call this function twice; the first to obtain the required buffer size, and the second to actually convert it:
#include <cstdlib> // for mbstowcs
const char* mbs = "c:\\user";
size_t requiredSize = ::mbstowcs(NULL, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
if(::mbstowcs(wcs, mbs, requiredSize + 1) != (size_t)(-1))
{
// Do what's needed with the wcs string
}
delete[] wcs;
If you rather use mbstowcs_s (because of deprecation warnings), then do this:
#include <cstdlib> // also for mbstowcs_s
const char* mbs = "c:\\user";
size_t requiredSize = 0;
::mbstowcs_s(&requiredSize, NULL, 0, mbs, 0);
wchar_t* wcs = new wchar_t[requiredSize + 1];
::mbstowcs_s(&requiredSize, wcs, requiredSize + 1, mbs, requiredSize);
if(requiredSize != 0)
{
// Do what's needed with the wcs string
}
delete[] wcs;
Make sure you take care of locale issues via setlocale() or using the versions of mbstowcs() (such as mbstowcs_l() or mbstowcs_s_l()) that takes a locale argument.
why are you using C code, and why not write it in a more portable way, for example what I would do here is use the STL!
std::string src = std::string("C:\\user") +
std::string(" 1");
std::wstring dne = std::wstring(src.begin(), src.end());
wcout << dne;
it's so simple it's easy :D
L"Hello World"
the prefix L in front of the string makes it a wide char string.

How do you convert from a nsACString to a LPCWSTR?

I'm making a firefox extension (nsACString is from mozilla) but LoadLibrary expects a LPCWSTR. I googled a few options but nothing worked. Sort of out of my depth with strings so any references would also be appreciated.
It depends whether your nsACString (which I'll call str) holds ASCII or UTF-8 data:
ASCII
std::vector<WCHAR> wide(str.Length()+1);
std::copy(str.beginReading(), str.endReading(), wide.begin());
// I don't know whether nsACString has a terminating NUL, best to be sure
wide[str.Length()] = 0;
LPCWSTR newstr = &wide[0];
UTF-8
// get length, including nul terminator
int len = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
str.BeginReading(), str.Length(), 0, 0);
if (len == 0) panic(); // happens if input data is invalid UTF-8
// allocate enough space
std::vector<WCHAR> wide(len);
// convert string
MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS,
str.BeginReading(), str.Length(), &wide[0], len)
LPCWSTR newstr = &wide[0];
This allocates only as much space as is needed - if you want faster code that potentially uses more memory than necessary, you can replace the first two lines with:
int len = str.Length() + 1;
This works because a conversion from UTF-8 to WCHAR never results in more characters than there were bytes of input.
Firstly note: LoadLibrary need not accept a LPWSTR. Only LoadLibraryW does. You may call LoadLibraryA directly (passing a narrow LPCSTR) and it will perform the translation for you.
If you choose to do it yourself however, below is one possible example.
nsACString sFoo = ...; // Some string.
size_t len = sFoo.Length() + 1;
WCHAR *swFoo = new WCHAR[len];
MultiByteToWideChar(CP_ACP, 0, sFoo.BeginReading(), len - 1, swFoo, len);
swFoo[len - 1] = 0; // Null-terminate it.
...
delete [] swFoo;
nsACString a;
const char* pData;
PRUint32 iLen = NS_CStringGetData(a, &pData);