How to convert from std::string to SQLWCHAR* - c++

I'm trying to convert std::string to SQLWCHAR*, but I couldn't find how.
Any brilliant suggestion, please?

One solution would be to simply use a std::wstring in the first place, rather than std::string. With Unicode Character set you can define a wide string literal using the following syntax:
std::wstring wstr = L"hello world";
However, if you would like to stick with std::string then you will need to convert the string to a different encoding. This will depend on how your std::string is encoded. The default encoding in Windows is ANSI (however the most common encoding when reading files or downloading text from websites is usually UTF8).
This answer shows a function for converting a std::string to a std::wstring on Windows (using the MultiByteToWideChar function).
https://stackoverflow.com/a/27296/966782
Note that the answer uses CP_ACP as the input encoding (i.e. ANSI). If your input string is UTF8 then you can change to use CP_UTF8 instead.
Once you have a std::wstring you should be able to easily retrieve a SQLWCHAR* using:
std::wstring::c_str()

Related

Convert from std::wstring to std::string

I'm converting wstring to string with std::codecvt_utf8 as described in this question, but when I tried Greek or Chinese alphabet symbols are corrupted, I can see it in the debug Locals window, for example 日本 became "日本"
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv; //also tried codecvt_utf8_utf16
std::string str = myconv.to_bytes(wstr);
What am I doing wrong?
std::string simply holds an array of bytes. It does not hold information about the encoding in which these bytes are supposed to be interpreted, nor do the standard library functions or std::string member functions generally assume anything about the encoding. They handle the contents as just an array of bytes.
Therefore when the contents of a std::string need to be presented, the presenter needs to make some guess about the intended encoding of the string, if that information is not provided in some other way.
I am assuming that the encoding you intend to convert to is UTF8, given that you are using std::codecvt_utf8.
But if you are using Virtual Studio, the debugger simply assumes one specific encoding, at least by default. That encoding is not UTF8, but I suppose probably code page 1252.
As verification, python gives the following:
>>> '日本'.encode('utf8').decode('cp1252')
'日本'
Your string does seem to be the UTF8 encoding of 日本 interpreted as if it was cp1252 encoded.
Therefore the conversion seems to have worked as intended.
As mentioned by #MarkTolonen in the comments, the encoding to assume for a string variable can be specified to UTF8 in the Visual Studio debugger with the s8 specifier, as explained in the documentation.

Problems converting std::wstring to UnicodeString using "UTF8ToUnicodeString"

In my project (using Embarcadero C++Builder), I am trying to convert a std::wstring to a UnicodeString using UTF8ToUnicodeString() from <system.hpp>.
The result shows some replacement characters (U+FFFD) for some Russian and Vietnamese characters. But most characters are shown correctly.
Does anybody know what the problem could be? Is it a problem with codepages?
First, neither std::wstring nor UnicodeString use UTF-8, so you should not be using UTF8ToUnicodeString() at all in this situation. UnicodeString uses UTF-16 on all platforms. std::wstring uses UTF-16 on Windows, and UTF-32 on most other platforms.
Second, std::wstring is a wchar_t-based string. UnicodeString uses wchar_t on Windows, and char16_t on other platforms. It has constructors that accept C-style wchar_t* string pointers as input, and will convert the data to UTF-16 if needed.
So, you can simply use the std::wstring::c_str() method to convert std::wstring to UnicodeString, eg:
std::wstring w = ...;
UnicodeString u = w.c_str();
Alternatively:
std::wstring w = ...;
UnicodeString u(w.c_str(), w.size());
If you try to assign a wchar_t* string to a RawByteString, such as for the input to UTF8ToUnicodeString(), the RTL will perform a Unicode->ANSI conversion to the default system ANSI codepage specified by System::DefaultSystemCodePage, which is not universally set to UTF-8 on all platforms (especially on Windows), hence why you may lose characters, or even potentially end up with Mojibake.

Is wstring character is Unicode ? What happens during conversion?

Recent times I am coming across the conversion of UTF-8 encoding to string and vice vera. I understood that UTF-8 encoding is used to hold almost all the characters in the world while using char which is built in data type for string, only ASCII values can be stored.For a character in UTF-8 encoding the number of bytes required in memory is varied from one byte to 4 bytes but for 'char' type it is usually 1 byte.
My question is what happens in conversion from wstring to string or wchar to char ?
Does the characters which require more than one byte is skipped? It seems it depends on implementation but I want to know what is the correct way of doing it.
Also does wchar is required to store unicode characters ? As far as I understood UNICODE characters can be stored in normal string as well. Why should we use wstring or wchar ?
Depends how you convert them.
You need to specify the source encoding type and the target encoding type.
wstring is not a format, it just defines a data type.
Now usually when one says "Unicode", one means UTF16 which is what Microsoft Windows uses, and that is usuasly what wstring contains.
So, the right way to convert from UTF8 to UTF16:
std::string utf8String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::wstring utf16String = convert.from_bytes( utf8String );
And the other way around:
std::wstring utf16String = "blah blah";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;
std::string utf16String = convert.to_bytes( utf16String );
And to add to the confusion:
When you use std::string on a windows platform (like when you use a multibyte compilation), It's NOT UTF8. They use ANSI.
More specifically, the default encoding language your windows is using.
When compiling in Unicode the windows API commands expect these formats:
CommandA - multibyte - ANSI
CommandW - Unicode - UTF16
Make your source files UTF-8 encoded, set the character encoding to UNICODE in your IDE.
Use std::string and widen them for WindowsAPI calls.
std::string somestring = "こんにちは";
WindowsApiW(widen(somestring).c_str());
I know it sounds kind of hacky but a more profound explaination of this issue can be found at utf8everywhere.org.

How to convert UTF-8 encoded std::string to UTF-16 std::string

How can i convert UTF-8 encoded std::string to UTF-16 std::string? Is it possible?
And no, i can't use std::wstring in my case.
Windows, MSVC-11.0.
How about trying like this:-
std::string s = u8"Your string";
// #include <codecvt>
std::wstring_convert<std::codecvt<char16_t,char,std::mbstate_t>,char16_t> convert;
std::u16string u16 = convert.from_bytes(s);
std::string u8 = convert.to_bytes(u16);
Also check this for UTF to UTF conversion.
From the docs:-
The specialization codecvt converts between
the UTF-16 and UTF-8 encoding schemes, and the specialization
codecvt converts between the UTF-32 and
UTF-8 encoding schemes.
I've come across dozens of such problems trying to do this and similar with Visual Studio, and just gave up. There is a known issue linking when doing conversions with e.g. std::wstring's convert and using std::codecvt.
Please see here:
Convert C++ std::string to UTF-16-LE encoded string
What I did to resolve my problem was copied in the code from a kind poster, which uses the iconv library. Then all I had to do was call convert(my_str, strlen(my_str), &used_bytes), where my_str was a char[], strlen(my_str) was its length, and size_t used_bytes = strlen(my_str)*3; I just gave it enough bytes to work with. In that function, you can change iconv_t foo = iconv_open("UTF-16", "UTF-8"), investigate the setlocale() and creation of the enc string passed to iconv_open() above in the function which is sitting there in all it's glory in the link above.
The gotcha is compiling and using iconv, it almost expects Cygwin or such on Windows, but you can use that with Visual Studio. There is a purely Win32 libiconv at https://github.com/win-iconv/win-iconv which might suit your needs.
I would say give iconv a try, and see how it goes in a short test program. Good luck!

Why use MultiByteToWideCharArray to convert std::string to std::wstring?

I want to convert a std::string to std::wstring. There are two approaches which i have come across.
Given a string str we cant convert into wide string using the following code
wstring widestring = std::wstring(str.begin(),str.end());
The other approach is to use MultiByteToWideCharArray().
What i wanted to understand was what is the drawback of using the first approach and how does the second approach solves thing problem
MultiByteToWideChar offers more options(like the ability to select "codepages") and translates non-standard symbols correctly
The first option doesn't support multibyte encoding. It will iterate through each byte (char) in the string and convert it to a wide character. When you have a string with multibyte encoding, individual characters can take more than one byte, so a standard string iterator is inappropriate.
The MultiByteToWideChar function has support for different multibyte formats, as specified by the codepage parameter.