MFC, Unicode and DDX_Text - c++

UTF-8 everywhere makes a strong case to shun the Microsoft TCHAR, _T(), LPCTSTR and so forth completely, to push wchar_t aside as well, and bravely embrace a world of UTF-8 strings based on a narrow char type.
Which seemed fine until I came to the MFC DDX_Text() macro for getting a CString both in and out of an edit control. Is there any reasonable way to:
Declare CStringA myString intended as an UTF-8 string (or as an ASCII/ANSI string as a degenerate case)
Compile with UNICODE defined
Pass myString through suitable conversions and/or temporary variables and into the third parameter to DDX_Text(), and get sensible results to and from the associated edit control?
If not, how would you recommend handling string input/output via edit controls if your application wanted to use UTF-8 (or ASCII/ANSI in the degenerate case)?
(P.S. this is motivated by Visual Studio 2013 encouraging Unicode-only use of MFC. Given an MFC app, and a desire to use VS2013, this requires UNICODE to be defined... or to cling on to a deprecated way of doing things.)

Windows internally uses UTF-16 as UNICODE standard. So you will have to follow that and use CString which is defined as CStringW in UNICODE. Also you have to use _T() macro. All Windows common controls like Edit Box, List Box, etc are using UNICODE as well.
I'd suggest using UTF-8 for networking stuff only.
// UTF8 conversion
CStringA CUtility::UTF16toUTF8(const CStringW& utf16)
{
return CW2A(utf16, CP_UTF8);
}
CStringW CUtility::UTF8toUTF16(const CStringA& utf8)
{
return CA2W(utf8, CP_UTF8);
}

Related

Should string encoding for library conform to Unicode or flexible?

I am created a library in C++ which exposes c style interface APIs. Some of the arguments are string so they would be char *. Now I know they should be all Unicode but because it is a library I don't think I want to force users to use decide or not. Ideally I thought it would be best to use TCHAR so I can build it either way for unicode code and ASCII users. Than I read this and it opposes the idea in general.
As an example of API, the strings are filenames or error messages like below.
void LoadSomeFile(char * fileName );
const char * GetErrorMsg();
I am using c++ and STL. There is this debate of std::string vs std::wstring as well.
Personally I really like MFC's CString class which takes care of all this nicely but that means I have to use MFC just for its string class.
Now I think TCHAR is probably the best solution for me but do I have to use CString (internally) for that to work? Can I use it with STL string? As far as I can see, it is either string or wstring there.
The TCHAR type is an unfortunate design choice that has thankfully been left behind us. Nobody has to use TCHAR any more, thank goodness. The Unicode choice has been made for us as well: Unicode is the only sane choice going forwards.
The question is, is your library Windows-only? Or is it portable?
If your library is portable, then the typical choice is char * or std::string with UTF-8 encoded strings. For more information, see UTF-8 Everywhere. The summary is that wchar_t is UTF-16 on Windows but UTF-32 everywhere else, which makes it almost useless for cross-platform programming.
If your library runs on Win32 only, then you may feel free to use wchar_t instead. On Windows, wchar_t is UTF-16.
Don't use both, it will make your code and API bloated and difficult to read. TCHAR is a hack for supporting the Win32 API and migrating to Unicode.

Differences using std::string in C++ Builder and VC++

since I can get hands on the new RAD Studio Xe4 I thought I'd give it a try.
Unfortunatly, I am not so experienced with C++ and therefore I was wondering why the Code that works perfectly fine in VC++ doesn't work at all in C++ Builder.
Most of the problems are converting different var-types.
For example :
std::string Test = " ";
GetFileAttributes(Test.c_str());
works in VC++ but in C++ Builder it won't compile, telling me "E2034 Cannot convert 'const char *' to 'wchar_t *'.
Am I missing something? What is the reason that doesn't work the same on all compilers the same?
Thanks
Welcome to Windows Unicode/ASCII hell.
The function
GetFileAttributes
is actually a macro defined to either GetFileAttributesA or GetFileAttributesW depending on if you have _UNICODE (or was it UNICODE, or both?) defined when you include the Windows headers. The *A variants take char* and related arguments, the *W functions take wchar_t* and related arguments.
I suggest calling only the wide *W variants directly in new code. This would mean switching to std::wstring for Windows only code and some well-thought out design choices for a cross-platform application.
Your C++ Builder config is set to use UNICODE character set, which means that Win32 APIs are resolved to their wide character versions. Therefore you need to use wide char strings in your C++ code. If you would set your VS config to use UNICODE, you would get the same error.
You can try this:
// wstring = basic_string<wchar_t>
// _T macro ensures that the specified literal is a wide char literal
std::wstring Test = _T(" ");
GetFileAttributes(Test.c_str()); // c_str now returns const wchar_t*, not const char*
See more details about _T/_TEXT macros here: http://docwiki.embarcadero.com/RADStudio/XE3/en/TCHAR_Mapping
You have defined _UNICODE and/or UNICODE in Builder and not defined it in VC.
Most Windows APIs come in 2 flavours the ANSI flavour and the UNICODE flavour.
For, when you call SetWindowText, there really is no SetWindowText functions. Instead there are 2 different functions
- SetWindowTextA which takes an ANSI string
and
- SetWindowTextW which takes a UNICODE string.
If your program is compiled with /DUNICODE /D_UNICODE, SetWindowText maps to SetWindowTextWwhich expects aconst wchar_t *`.
If your program is compiled without these macros defined, it maps to SetWindowTextA which takes a const char *.
The windows headers typically do something like this to make this happen.
#ifdef UNICODE
#define SetWindowText SetWindowTextW
#else
#define SetWindowText SetWindowTextA
#endif
Likewise, there are 2 GetFileAttributes.
DWORD WINAPI GetFileAttributesA(LPCSTR lpFileName);
DWORD WINAPI GetFileAttributesW(LPCWSTR lpFileName);
In VC, you haven't defined UNICODE/_UNICODE & hence you are able to pass string::c_str() which returns a char *.
In Builder, you probably have defined UNICODE/_UNICODE & it expects a wchar_t *.
You may not have done this UNICODE/_UNICODE thing explicitly - may be the IDE is doing it for you - so check the options in the IDE.
You have many ways of fixing this
find the UNICODE/_UNICODE option in the IDE and disable it.
or
use std::w_string - then c_str() will return a wchar_t *
or
Call GetFileAttributesA directly instead of GetFileAttributes - you will need to do this for every other Windows API which comes with these 2 variants.

GetWindowText with char[]

I am quite new to Windows programming. I am trying to retrieve the name of a window.
char NewName[128];
GetWindowText(hwnd, NewName, 128);
I need to use a char[] but it gives me the error of wrong type.
From what I read, LPWSTR is a kind of char*.
How can I use a char[] with GetWindowText ?
Thanks a lot !
You are probably compiling a Unicode project, so you can either:
Explicitly call the ANSI version of the function (GetWindowTextA), or
Use wchar_t instead of char (LPWSTR is a pointer to wchar_t)
For modern Windows programming (that means, after the year 2000 when Microsoft introduced the Layer for Unicode for Windows 9x), you're far better off using "Unicode", which in C++ in Windows means using wchar_t.
That is, use wchar_t instead of char, and use std::wstring instead of std::string.
Remember to define UNICODE before including <windows.h>. It's also a good idea to define NOMINMAX and STRICT. Although nowadays the latter is defined by default.
When calling Windows APIs without specifying an explicit version by appending either A (ANSI) or W (wide char) you should always use TCHAR. TCHAR will map to the correct type depending on whether UNICODE is #defined or not.

proper style for interfacing with legacy TCHAR code

I'm modifying someone else's code which uses TCHAR extensively. Is it better form to just use std::wstring in my code? wstring should be equivalent to TString on widechar platforms so I don't see an issue. The rationale being, its easier to use a raw wstring than to support TCHAR... e.g., using boost:wformat.
Which style will be more clear to the next maintainer? I wasted several hours myself trying to understand string intricacies, it seems just using wstring would cut off half of the stuff you need to understand.
typedef std::basic_string<TCHAR> TString; //on winxp, TCHAR resolves to wchar_t
typedef basic_string<wchar_t, char_traits<wchar_t>, allocator<wchar_t> > wstring;
...the only difference is the allocator.
In the unlikely case that your program
lands on a Window 9x machine, there's
still an API layer that can translate
your UTF-16 strings to 8-bit chars.
There's no point left in using TCHAR
for new code development.
source
If you are only intending on targetting Unicode (wchar_t) platforms, you are better off using std::wstring. If you want to support multibyte and Unicode builds, you will need to use TString and similar.
Also note that basic_string defaults the char_traits and allocator to one based on the passed in character type, so on builds where UNICODE (or _UNICODE, I can never remember which), TString and wstring will be the same.
NOTE: If you are just passing the arguments to various APIs and not doing any manipulations on them, you are better off using const wchar_t * instead of std::wstring directly (especially if mixing Win32, COM and standard C++ code) as you will end up doing less conversions and copying.
TCHAR used to be more important when you where going to compile the binaries twice, once for char and a second for wchar_t.
You can still make this choice if you like, changing the MSVC project settings from MBCS to Unicode and back.
This also means when calling the windows API you will have the matching data type.

Would std::basic_string<TCHAR> be preferable to std::wstring on Windows?

As I understand it, Windows #defines TCHAR as the correct character type for your application based on the build - so it is wchar_t in UNICODE builds and char otherwise.
Because of this I wondered if std::basic_string<TCHAR> would be preferable to std::wstring, since the first would theoretically match the character type of the application, whereas the second would always be wide.
So my question is essentially: Would std::basic_string<TCHAR> be preferable to std::wstring on Windows? And, would there be any caveats (i.e. unexpected behavior or side effects) to using std::basic_string<TCHAR>? Or, should I just use std::wstring on Windows and forget about it?
I believe the time when it was advisable to release non-unicode versions of your application (to support Win95, or to save a KB or two) is long past: nowadays the underlying Windows system you'll support are going to be unicode-based (so using char-based system interfaces will actually complicate the code by interposing a shim layer from the library) and it's doubtful whether you'd save any space at all. Go std::wstring, young man!-)
I have done this on very large projects and it works great:
namespace std
{
#ifdef _UNICODE
typedef wstring tstring;
#else
typedef string tstring;
#endif
}
You can use wstring everywhere instead though if you'd like, if you do not need to ever compile using a multi-byte character string. I don't think you need to ever support multi byte character strings though in any modern application.
Note: The std namespace is supposed to be off limits, but I have not had any problems with the above method for several years.
One thing to keep in mind. If you decide to use std::wstring all the way in your program, you might still need to use std::string if you are communicating with other systems using UTF8.