Using UNICODE character values in C++ - c++

How do you use unicode in C++ ?
Im aware of wchar_t and wchar_t* but I want to know how you can assign value using only Unicode Values, similar to the way a character can be assigned by equating the variable to the ASCII value:
char a = 92;
Im uysing the MinGW compiler, if it makes a difference.

It can be as simple as:
wchar_t a=L'a';
wchar_t hello_world[]=L"Hello World";
// Or if you really want it to be (old school) C++ and not C
std::wstring s(L"Hello World");
// Or if you want to (be bleeding edge and) use C++11
std::u16string s16(u"Hello World");
std::u32string s32(U"Hello World for the ∞ᵗʰ time");

Exactly the same way:
wchar_t a = 97;
wchar_t xi = 0x03be; // ξ

Related

Converting string to wchar_t (wide character) C++ [duplicate]

Is there any method?
My computer is AMD64.
::std::string str;
BOOL loadU(const wchar_t* lpszPathName, int flag = 0);
When I used:
loadU(&str);
the VS2005 compiler says:
Error 7 error C2664:: cannot convert parameter 1 from 'std::string *__w64 ' to 'const wchar_t *'
How can I do it?
First convert it to std::wstring:
std::wstring widestr = std::wstring(str.begin(), str.end());
Then get the C string:
const wchar_t* widecstr = widestr.c_str();
This only works for ASCII strings, but it will not work if the underlying string is UTF-8 encoded. Using a conversion routine like MultiByteToWideChar() ensures that this scenario is handled properly.
If you have a std::wstring object, you can call c_str() on it to get a wchar_t*:
std::wstring name( L"Steve Nash" );
const wchar_t* szName = name.c_str();
Since you are operating on a narrow string, however, you would first need to widen it. There are various options here; one is to use Windows' built-in MultiByteToWideChar routine. That will give you an LPWSTR, which is equivalent to wchar_t*.
You can use the ATL text conversion macros to convert a narrow (char) string to a wide (wchar_t) one. For example, to convert a std::string:
#include <atlconv.h>
...
std::string str = "Hello, world!";
CA2W pszWide(str.c_str());
loadU(pszWide);
You can also specify a code page, so if your std::string contains UTF-8 chars you can use:
CA2W pszWide(str.c_str(), CP_UTF8);
Very useful but Windows only.
If you are on Linux/Unix have a look at mbstowcs() and wcstombs() defined in GNU C (from ISO C 90).
mbs stand for "Multi Bytes String" and is basically the usual zero terminated C string.
wcs stand for Wide Char String and is an array of wchar_t.
For more background details on wide chars have a look at glibc documentation here.
Need to pass a wchar_t string to a function and first be able to create the string from a literal string concantenated with an integer variable.
The original string looks like this, where 4 is the physical drive number, but I want that to be changeable to match whatever drive number I want to pass to the function
auto TargetDrive = L"\\\\.\\PhysicalDrive4";
The following works
int a = 4;
std::string stddrivestring = "\\\\.\\PhysicalDrive" + to_string(a);
std::wstring widedrivestring = std::wstring(stddrivestring.begin(), stddrivestring.end());
const wchar_t* TargetDrive = widedrivestring.c_str();

convert unicode to char

How can I convert a Unicode string to a char* or char* const in embarcadero c++ ?
String text = "Hello world";
char *txt = AnsiString(text).c_str();
Older text.t_str() is now AnsiString(String).c_str()
"Unicode string" really isn't specific enough to know what your source data is, but you probably mean 'UTF-16 string stored as wchar_t array' since that's what most people who don't know the correct terminology use.
"char*" also isn't enough to know what you want to target, although maybe "embarcadero" has some convention. I'll just assume you want UTF-8 data unless you mention otherwise.
Also I'll limit my example to what works in VS2010
// your "Unicode" string
wchar_t const * utf16_string = L"Hello, World!";
// #include <codecvt>
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
std::string utf8_string = convert.to_bytes(utf16_string);
This assumes that wchar_t strings are UTF-16, as is the case on Windows, but otherwise is portable code.
You can reinterpret any array as an array of char pointers legally. So if your Unicode data comes in 4-byte code units like
char32_t data[100];
then you can access it as a char array:
char const * p = reinterpret_cast<char const*>(data);
for (std::size_t i = 0; i != sizeof data; ++i)
{
std::printf("Byte %03zu is 0x%02X.\n", i, p[i]);
}
That way, you can examine the individual bytes of your Unicode data one by one.
(That has of course nothing to do with converting the encoding of your text. For that, use a library like iconv or ICU.)
If you work with Windows:
//#include <windows.h>
u16string utext = u"объява";
char text[0x100];
WideCharToMultiByte(CP_UTF8,NULL,(const wchar_t*)(utext.c_str()),-1,text,-1,NULL,NULL);
cout << text;
We can't use std::wstring_convert, wherefore is not available in MinGW 4.9.2.

C++ Unicode Bullet Point

I am trying to insert the Unicode character U+2022 (bullet •) in my C++ application.
I can't figure out how to convert that U+2022 to a char/string for use in std::string constructor...
char bullet = char(0x2022);
mPassword.SetText( std::string(mText.length(), bullet) );
This one doesn't work. Hope you can help !!
Thanksopatut
Unicode character has type wchar_t(see §2.13.4 of the C++ Standard). You could use it as follows:
wchar_t bullet = L'\x2022';
In string it will look like:
std::wstring str_w_bullet( L"some text with \x2022" );
use std::wstring which is that same as std::string but specialized on wchar_t

How to convert std::wstring to a TCHAR*?

How to convert a std::wstring to a TCHAR*? std::wstring.c_str() does not work since it returns a wchar_t*.
How do I get from wchar_t* to TCHAR*, or from std::wstring to TCHAR*?
use this :
wstring str1(L"Hello world");
TCHAR * v1 = (wchar_t *)str1.c_str();
#include <atlconv.h>
TCHAR *dst = W2T(src.c_str());
Will do the right thing in ANSI or Unicode builds.
TCHAR* is defined to be wchar_t* if UNICODE is defined, otherwise it's char*. So your code might look something like this:
wchar_t* src;
TCHAR* result;
#ifdef UNICODE
result = src;
#else
//I think W2A is defined in atlbase.h, and it returns a stack-allocated var.
//If that's not OK, look at the documenation for wcstombs.
result = W2A(src);
#endif
in general this is not possible since wchar_t may not be the same size as TCHAR.
several solutions are already listed for converting between character sets. these can work if the character sets overlap for the range being converted.
I prefer to sidestep the issue entirely wherever possible and use a standard string that is defined on the TCHAR character set as follows:
typedef std::basic_string<TCHAR> tstring;
using this you now have a standard library compatible string that is also compatible with the windows TCHAR macro.
You can use:
wstring ws = L"Testing123";
string s(ws.begin(), ws.end());
// s.c_str() is what you're after
Assuming that you are operating on Windows.
If you are in Unicode build configuration, then TCHAR and wchar_t are the same thing. You might need a cast depending on whether you have /Z set for wchar_t is a type versus wchar_t is a typedef.
If you are in a multibyte build configuration, you need MultiByteToWideChar (and vica versa).

Assigning a "const char*" to std::string is allowed, but assigning to std::wstring doesn't compile. Why?

I assumed that std::wstring and std::string both provide more or less the same interface.
So I tried to enable unicode capabilities for our application
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
# else
typedef std::string AppStringType;
# endif
However that gives me a lot of compile errors when -DAPP_USE_UNICODE is used.
It turned out, that the compiler chokes when a const char[] is assigned to std::wstring.
EDIT: improved example by removing the usage of literal "hello".
#include <string>
void myfunc(const char h[]) {
string s = h; // compiles OK
wstring w = h; // compile Error
}
Why does it make such a difference?
Assigning a const char* to std::string is allowed, but assigning to std::wstring gives compile errors.
Shouldn't std::wstring provide the same interface as std::string? At least for such a basic operation as assignment?
(environment: gcc-4.4.1 on Ubuntu Karmic 32bit)
You should do:
#include <string>
int main() {
const wchar_t h[] = L"hello";
std::wstring w = h;
return 0;
}
std::string is a typedef of std::basic_string<char>, while std::wstring is a typedef of std::basic_string<wchar_t>. As such, the 'equivalent' C-string of a wstring is an array of wchar_ts.
The 'L' in front of the string literal is to indicate that you are using a wide-char string constant.
The relevant part of the string API is this constructor:
basic_string(const charT*);
For std::string, charT is char. For std::wstring it's wchar_t. So the reason it doesn't compile is that wstring doesn't have a char* constructor. Why doesn't wstring have a char* constructor?
There is no one unique way to convert a string of char to a string of wchar. What's the encoding used with the char string? Is it just 7 bit ASCII? Is it UTF-8? Is it UTF-7? Is it SHIFT-JIS? So I don't think it would entirely make sense for std::wstring to have an automatic conversion from char*, even though you could cover most cases. You can use:
w = std::wstring(h, h + sizeof(h) - 1);
which will convert each char in turn to wchar (except the NUL terminator), and in this example that's probably what you want. As int3 says though, if that's what you mean it's most likely better to use a wide string literal in the first place.
To convert from a multibyte encoding to a wide character encoding, take a look at the header <locale> and the type std::codecvt. The Dinkumware library has a class Dinkum::wstring_convert that makes performing such multibyte-to-wide conversions easier.
The function std::codecvt_byname allows one to find a codecvt instance for a particular named encoding. Unfortunately, discovering the names of the encodings (or locales) on your system is implementation-specific.
Small suggestion... Do not use "Unicode" strings under Linux (a.k.a. wide strings). std::string is perfectly fine and holds Unicode very well (UTF-8).
Most Linux API works with char * strings and most popular encoding is UTF-8.
So... Just don't bother yourself using wstring.
In addition to the other answers, you could use a trick from Microsoft's book (specifically, tchar.h), and write something like this:
# ifdef APP_USE_UNICODE
typedef std::wstring AppStringType;
#define _T(s) (L##s)
# else
typedef std::string AppStringType;
#define _T(s) (s)
# endif
AppStringType foo = _T("hello world!");
(Note: my macro-fu is weak, and this is untested, but you get the idea.)
Looks like you can do something like this:
#include <sstream>
// ...
std::wstringstream tmp;
tmp << "hello world";
std::wstring our_string =
Although for a more complex situation, you may want to break down and use mbstowcs
you should use
#include <tchar.h>
tstring instead of wstring/string
TCHAR* instead of char*
and _T("hello") instead of "hello" or L"hello"
this will use the appropriate form of string+char, when _UNICODE is defined.