How can I use UTF-8 string as char* in Visual C++? - c++

I need to create a library which will expose some C APIs.
get_ble_list(char **deviceNames);
Here I use another third party library to get the list of Bluetooth devices which returns the device name as CAtlStringW.
Now I need to convert this into normal C character array which I am able to do. But when it comes to non ASCII, Unicode characters, I didn't get an expected result.
Eg: I set the Bluetooth name on my mobile as മലയാളം. But I got ??? as the output on my test GUI app which uses my library.

The source string is CStringW, not CAtlStringW which is a base class. Use CA2W and CW2A to convert between UTF8 and UTF16.
#include <Windows.h>
#include <atlstr.h>//Visual Studio specific
...
CStringA utf8 = CW2A(L"മലയാളം", CP_UTF8);
CStringW utf16 = CA2W(utf8, CP_UTF8);
MessageBoxW(0, utf16, 0, 0);
You can cast utf8 to const char*. Or cast utf16 to const wchar_t*.
This is not a writable buffer. Don't cast either CAtlStringW or CStringW to wchar_t*, and certainly not char* which is completely wrong.
For writable buffer, use CString::GetBuffer/CString::ReleaseBuffer method. Or allocate new buffer to pass. Make sure the source compiles with zero warnings.
You can switch to UTF16 if the C APIs are used exclusively by Windows. Use wchar_t instead of char, use "wide C string" version of C functions, example wcscpy instead of strcpy.
void get_ble_list(wchar_t **deviceNames)
{
...
wcscpy(...);
}

Related

Proper way crossplatfom convert from std::string to 'const TCHAR *'

I'm working for crossplatrofm project in c++ and I have variable with type std::string and need convert it to const TCHAR * - what is proper way, may be functions from some library ?
UPD 1: - as I see in function definition there is split windows and non-Windows implementations:
#if defined _MSC_VER || defined __MINGW32__
#define _tinydir_char_t TCHAR
#else
#define _tinydir_char_t char
#endif
- so is it a really no way for non spliting realization for send parameter from std::string ?
Proper way crossplatfom convert from std::string to 'const TCHAR *'
TCHAR should not be used in cross platform programs at all; Except of course, when interacting with windows API calls, but those need to be abstracted away from the rest of the program or else it won't be cross-platform. So, you only need to convert between TCHAR strings and char strings in windows specific code.
The rest of the program should use char, and preferably assume that it contains UTF-8 encoded strings. If user input, or system calls return strings that are in a different encoding, you need to figure out what that encoding is, and convert accordingly.
Character encoding conversion functionality of the C++ standard library is rather weak, so that is not of much use. You can implement the conversion according the encoding specification or you can use a third party implementation, as always.
may be functions from some library ?
I recommend this.
as I see in function definition there is split windows and non-Windows implementations
The library that you use doesn't provide a uniform API to different platforms, so it cannot be used in a truly cross-platform way. You can write a wrapper library with uniform function declarations that handles the character encoding conversion on platforms that need it.
Or, you can use another library, which provides a uniform API and converts the encoding transparently.
TCHAR are Windows type and it defined in this way:
#ifdef UNICODE
typedef wchar_t TCHAR, *PTCHAR;
#else
typedef char TCHAR, *PTCHAR;
#endif
UNICODE macro is typically defined in project settings (in case when your use Visual Studio project on Windows).
You can get the const TCHAR* from std::string (which is ASCII or UTF8 in most cases) in this way:
std::string s("hello world");
const TCHAR* pstring = nullptr;
#ifdef UNICODE
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::wstring wstr = converter.from_bytes(s);
pstring = wstr.data();
#else
pstring = s.data();
#endif
pstring will be the result.
But it's highly not recommended to use the TCHAR on other platforms. It's better to use the UTF8 strings (char*) within std::string
I came across boost.nowide the other day. I think it will do exactly what you want.
http://cppcms.com/files/nowide/html/
As others have pointed out, you should not be using TCHAR except in code that interfaces with the Windows API (or libraries modeled after the Windows API).
Another alternative is to use the character conversion classes/macros defined in atlconv.h. CA2T will convert an 8-bit character string to a TCHAR string. CA2CT will convert to a const TCHAR string (LPCTSTR). Assuming your 8-bit strings are UTF-8, you should specify CP_UTF8 as the code page for the conversion.
If you want to declare a variable containing a TCHAR copy of a std::string:
CA2T tstr(stdstr.c_str(), CP_UTF8);
If you want to call a function that takes an LPCTSTR:
FunctionThatTakesString(CA2CT(stdsr.c_str(), CP_UTF8));
If you want to construct a std::string from a TCHAR string:
std::string mystdstring(CT2CA(tstr, CP_UTF8));
If you want to call a function that takes an LPTSTR then maybe you should not be using these conversion classes. (But you can if you know that the function you are calling does not modify the string outside its current length.)

Is MFC CString a wide char string

I'm working on a win32 project with CStrings (console application), and I noticed something odd when I want to pass to a function (like strtok_s for example) a LPSTR pointer from a CString with the method GetBuffer(), this last one instead of giving me a LPSTR, it gave me a LPWSTR (a pointer to a wide string)... CString is supposed to store 8 bit chars isn't it ?
I'm obliged in some cases to use CStringA for example to be able for example to use the method Find() because with a CString my input string must be a wide one. But in other another project (windowed program), I don't have this problem, i'm suspecting the headers (when I use afxstr.h "Find" works with a normal string, but not with afxcoll.h...)
Usually I work with std::string that's why I'm lost.
CString is a typdef, declared as (afxstr.h):
typedef ATL::CStringT< TCHAR, StrTraitMFC< TCHAR > > CString;
// Or, when using the MFC DLL
typedef ATL::CStringT< TCHAR, StrTraitMFC_DLL< TCHAR > > CString;
Depending on what TCHAR is, a CString stores either an ANSI (MBCS) or Unicode string. There are also explicit instantiations of the CStringT template: CStringW and CStringA.
Either type has a conversion constructor, taking a constant pointer to the respective other character encoding. In other words: You can construct a CStringW from an ANSI (MBCS) string, as well as a CStringA from a UTF-16LE-encoded Unicode string.
If you need to be explicit about the character encoding, use either CStringW or CStringA.
Full documentation for CString is available at CStringT Class.

MFC, Unicode and DDX_Text

UTF-8 everywhere makes a strong case to shun the Microsoft TCHAR, _T(), LPCTSTR and so forth completely, to push wchar_t aside as well, and bravely embrace a world of UTF-8 strings based on a narrow char type.
Which seemed fine until I came to the MFC DDX_Text() macro for getting a CString both in and out of an edit control. Is there any reasonable way to:
Declare CStringA myString intended as an UTF-8 string (or as an ASCII/ANSI string as a degenerate case)
Compile with UNICODE defined
Pass myString through suitable conversions and/or temporary variables and into the third parameter to DDX_Text(), and get sensible results to and from the associated edit control?
If not, how would you recommend handling string input/output via edit controls if your application wanted to use UTF-8 (or ASCII/ANSI in the degenerate case)?
(P.S. this is motivated by Visual Studio 2013 encouraging Unicode-only use of MFC. Given an MFC app, and a desire to use VS2013, this requires UNICODE to be defined... or to cling on to a deprecated way of doing things.)
Windows internally uses UTF-16 as UNICODE standard. So you will have to follow that and use CString which is defined as CStringW in UNICODE. Also you have to use _T() macro. All Windows common controls like Edit Box, List Box, etc are using UNICODE as well.
I'd suggest using UTF-8 for networking stuff only.
// UTF8 conversion
CStringA CUtility::UTF16toUTF8(const CStringW& utf16)
{
return CW2A(utf16, CP_UTF8);
}
CStringW CUtility::UTF8toUTF16(const CStringA& utf8)
{
return CA2W(utf8, CP_UTF8);
}

Implicit convert CString to char*

I have downloaded an sample code, so there are some CString variables in that code which are passed to the sscanf() function as char* the compiler implicitly converts those CString and the code complie fine.the code which works fine is here:
CString m_strVersionXFS;
m_strVersionXFS = _T("00029903");
DWORD nVersion;
sscanf(m_strVersionXFS,"%08X",&nVersion);
the problem is here when i tried to write my own simple code which tries to manipulate a CString variable in the same way but the compiler says which can't convert a CString to a cahr*
I suspect that your own code is using unicode (UNICODE constant defined). This means that CString is using wide characters, and will implicitly convert to wchar_t*, but not to char*.
If that is the case, there are three possible solutions:
Use swscanf, the wide character version of sscanf (recommended);
Explicitly use CStringA instead of CString so you're using the ANSI version of CString;
Change your project to use multi-byte characters rather than unicode (this can be done from the project properties in Visual Studio).

Convert ICU UnicodeString to platform dependent char * (or std::string)

In my application I use ICU UnicodeString to store my strings. Since I use some libraries incompatible with ICU, I need to convert UnicodeString to its platform dependent representation.
Basicly what I need to do is reverse process form creating new UnicodeString object - new UnicodeString("string encoded in system locale").
I found out this topic - so I know it can be done with use of stringstream.
So my answer is, can it be done in some other simpler way, without using stringstream to convert?
i use
std::string converted;
us.toUTF8String(converted);
us is (ICU) UnicodeString
You could use UnicodeString::extract() with a codepage (or a converter). Actually passing NULL for the codepage will use what ICU detected as the default codepage.
You could use the functions in ucnv.h -- namely void ucnv_fromUnicode (UConverter *converter, char **target, const char *targetLimit, const UChar **source, const UChar *sourceLimit, int32_t *offsets, UBool flush, UErrorCode *err). It's not a nice C++ API like UnicodeString, but it will work.
I'd recommend just sticking with the operator<< you're already using if at all possible. It's the standard way to handle lexical conversions (i.e. string to/from integers) in C++ in any case.