Should I use _T or _TEXT on C++ string literals? - c++

For example:
// This will become either SomeMethodA or SomeMethodW,
// depending on whether _UNICODE is defined.
SomeMethod( _T( "My String Literal" ) );
// Becomes either AnotherMethodA or AnotherMethodW.
AnotherMethod( _TEXT( "My Text" ) );
I've seen both. _T seems to be for brevity and _TEXT for clarity. Is this merely a subjective programmer preference or is it more technical than that? For instance, if I use one over the other, will my code not compile against a particular system or some older version of a header file?

A simple grep of the SDK shows us that the answer is that it doesn't matter—they are the same. They both turn into __T(x).
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define _T(" *.h
crt\src\tchar.h:2439:#define _T(x) __T(x)
include\tchar.h:2390:#define _T(x) __T(x)
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define _TEXT(" *.h
crt\src\tchar.h:2440:#define _TEXT(x) __T(x)
include\tchar.h:2391:#define _TEXT(x) __T(x)
And for completeness:
C:\...\Visual Studio 8\VC>findstr /spin /c:"#define __T(" *.h
crt\src\tchar.h:210:#define __T(x) L ## x
crt\src\tchar.h:889:#define __T(x) x
include\tchar.h:210:#define __T(x) L ## x
include\tchar.h:858:#define __T(x) x
However, technically, for C++ you should be using TEXT() instead of _TEXT(), but it (eventually) expands to the same thing too.

Commit to Unicode and just use L"My String Literal".

From Raymond Chen:
TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE
The plain versions without the
underscore affect the character set
the Windows header files treat as
default. So if you define UNICODE,
then GetWindowText will map to
GetWindowTextW instead of
GetWindowTextA, for example.
Similarly, the TEXT macro will map to
L"..." instead of "...".
The versions with the underscore
affect the character set the C runtime
header files treat as default. So if
you define _UNICODE, then _tcslen will
map to wcslen instead of strlen, for
example. Similarly, the _TEXT macro
will map to L"..." instead of "...".
What about _T? Okay, I don't know
about that one. Maybe it was just to
save somebody some typing.
Short version: _T() is a lazy man's _TEXT()
Note: You need to be aware of what code-page your source code text editor is using when you write:
_TEXT("Some string containing Çontaining");
TEXT("€xtended characters.");
The bytes the compiler sees depends on the code page of your editor.

Here's an interesting read from a well-known and respected source.
Similarly, the _TEXT macro will map to L"..." instead of "...".
What about _T? Okay, I don't know about that one. Maybe it was just to save somebody some typing.

These macros are a hold over from the days when an application might have actually wanted to compile both a unicode and ANSI version.
There is no reason to do this today - this is all vestigial. Microsoft is stuck with supporting every possible configuration forever, but you aren't. If you are not compiling to both ANSI and Unicode (and no one is, let's be honest) just go to with L"text".
And yes, in case it wasn't clear by now: _T == _TEXT

I've never seen anyone use _TEXT() instead of _T().

Neither. In my experience there are two basic types of string literals, those that are invariant, and those that need to be translated when your code is localized.
It's important to distinguish between the two as you write the code so you don't have to come back and figure out which is which later.
So I use _UT() for untranslatable strings, and ZZT() (or something else that is easy to search on) for strings that will need to be translated. Instances of _T() or _TEXT() in the code are evidence of string literals that have not yet be correctly categorized.
_UT and ZZT are both #defined to _TEXT

Use neither, and also please don't use the L"..." crap.
Use UTF-8 for all strings, and convert them just before passing to microsoft APIs.

Related

__int64 to CString returns wrong values - C++ MFC

I want to convert a __int64 variable into a CString. The code is exactly this
__int64 i64TotalGB;
CString totalSpace;
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(_T("%I64d", i64TotalGB));
printf("totalSpace contains: %s", totalSpace);
the first printf prints
"disk space: 150GB"
which it's correct, but the second printf prints randomly high numbers like
"totalSpace contains: 298070026817519929"
I also tried to use a INT64 variable instead of a __int64 variable, but the result is the same. What can be the cause of this?
Here:
totalSpace.Format(_T("%I64d", i64TotalGB));
you're passing i64TotalGB as an argument to the _T() macro instead of passing it as the second argument to Format().
Try this:
totalSpace.Format(_T("%I64d"), i64TotalGB);
Having said that, thanks to MS's mess (ha) around character encodings, using _T here is not the right thing, as CString is composed of a TCHAR and not _TCHAR. So taking that into account, might as well use TEXT() instead of T(), as it is dependent on UNICODE and not _UNICODE:
totalSpace.Format(TEXT("%I64d"), i64TotalGB);
In addition, this line is wrong as it tries to pass an ATL CString as a char* (a.k.a. C-style string):
printf("totalSpace contains: %s", totalSpace);
For which the compiler gives this warning:
warning C4477: 'printf' : format string '%s' requires an argument of type 'char *', but variadic argument 1 has type 'ATL::CString'
While the structure of CString is practically compatible with passing it like you have, this is still formally undefined behavior. Use CString::GetString() to safeguard against it:
printf("totalSpace contains: %ls", totalSpace.GetString());
Note the %ls as under my configuration totalSpace.GetString() returned a const wchar_t*. However, as "printf does not currently support output into a UNICODE stream.", the correct version for this line, that will support characters outside your current code page, is a call to wprintf() in the following manner:
wprintf("totalSpace contains: %s", totalSpace.GetString());
Having said ALL that, here's a general advice, regardless of the direct problem behind the question. The far better practice nowadays is slightly different altogether, and I quote from the respectable answer by #IInspectable, saying that "generic-text mappings were relevant 2 decades ago".
What's the alternative? In the absence of good enough reason, try sticking explicitly to CStringW (A Unicode character type string with CRT support). Prefer the L character literal over the archaic data/text mappings that depend on whether the constant _UNICODE or _MBCS has been defined in your program. Conversely, the better practice would be using the wide-character versions of all API and language library calls, such as wprintf() instead of printf().
The bug is a result of numerous issues with the code, specifically these 2:
totalSpace.Format(_T("%I64d", i64TotalGB));
This uses the _T macro in a way it's not meant to be used. It should wrap a single character string literal. In the code it wraps a second argument.
printf("totalSpace contains: %s", totalSpace);
This assumes an ANSI-encoded character string, but passes a CString object, that can store both ANSI as well as Unicode encoded strings.
The recommended course of action is to drop generic-text mappings altogether, in favor of using Unicode (that's UTF-16LE on Windows) throughout1. The generic-text mappings were relevant 2 decades ago, to ease porting of Win9x code to the Windows NT based products.
To do this
Choose CStringW over CString.
Drop all occurences of _T, TEXT, and _TEXT, and replace them with an L prefix.
Use the wide-character version of the Windows API, CRT, and C++ Standard Library.
The fixed code looks like this:
__int64 i64TotalGB;
CStringW totalSpace; // Use wide-character string
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(L"%I64d", i64TotalGB); // Use wide-character string literal
wprintf(L"totalSpace contains: %s", totalSpace.GetString()); // Use wide-character library
On an unrelated note, while it is technically safe to pass a CString object in place of a character pointer in a variable argument list, this is an implementation detail, and not formally documented to work. Call CString::GetString() if you care about correct code.
1 Unless there is a justifiable reason to use a character encoding that uses char as its underlying type (like UTF-8 or ANSI). In that case you should still be explicit about it by using CStringA.
try this
totalSpace.Format(_T("%I64d"), i64TotalGB);

define string at compiler options

Using Tornado 2.2.1 GNU
at C/C++ compiler options I'm trying to define string as follow:
-DHELLO="Hello" and it doesn't work (it also failed for -DHELLO=\"Hello\" and for -DHELLO=\\"Hello\\" which works in other platforms)
define value -DVALUE=12 works without issue.
does anybody know to proper way to define string in Tornado?
The problem with such a macro is, that it normally isn't a string (in the C/C++ sense), just a preprocessor symbol. With numbers it works indeed, because preprocessor number can be used in C/C++ as is, but with string symbols, if you want to convert them to C/C++ strings (besides adding the escaped quotes) you need to "stringize" them.
So, this should work (without extra escaped quotes):
#define _STRINGIZE(x) #x
#define STRINGIZE(x) _STRINGIZE(x)
string s = STRINGIZE(HELLO)
(note the double expansion to get the value of the macro stringized, i.e. "Hello", instead of the macro name itself, i.e. "HELLO")

Using Unicode in a C++ source file

I'm working with a C++ sourcefile in which I would like to have a quoted string that contains Asian Unicode characters.
I'm working with QT on Windows, and the QT Creator development environment has no problem displaying the Unicode. The QStrings also have no problem storing Unicode. When I paste in my Unicode, it displays fine, something like:
#define MY_STRING 鸟
However, when I save, my lovely Unicode characters all become ? marks.
I tried to open up the source file and resave it as Unicode encoded. It then displays and saves correctly in QT Creator. However, on compile, it seems like the compiler has no idea what to do with this, and throws a ton of misguided errors and warnings, such as "stray \255 in program" and "null character(s) ignored".
What's the correct way to include Unicode in C++ source files?
Personally, I don't use any non-ASCII characters in source code. The reason is that if you use arbitary Unicode characters in your source files, you have to worry about the encoding that the compiler considers the source file to be in, what execution character set it will use and how it's going to do the source to execution character set conversion.
I think that it's a much better idea to have Unicode data in some sort of resource file, which could be compiled to static data at compile time or loaded at runtime for maximum flexibility. That way you can control how the encoding occurs, at not worry about how the compiler behaves which may be influence by the local locale settings at compile time.
It does require a bit more infrastructure, but if you're having to internationalize it's well worth spending the time choosing or developing a flexible and robust strategy.
While it's possible to use universal character escapes (L'\uXXXX') or explicitly encoded byte sequences ("\xXX\xYY\xZZ") in source code, this makes Unicode strings virtually unreadable for humans. If you're having translations made it's easier for most people involved in the process to be able to deal with text in an agreed universal character encoding scheme.
Using the L prefix and \u or \U notation for escaping Unicode characters:
Section 6.4.3 of the C99 specification defines the \u escape sequences.
Example:
#define MY_STRING L"A \u8801 B"
/* A congruent-to B */
Are you using a wchar_t interface? If so, you want L"\u1234" for a wide string containing Unicode character U+1234 (hex 0x1234). (Looking at the QString header file I think this is what you need.)
If not and your interface is UTF-8 then you'll need to encode your character in UTF-8 first and then create a narrow string containing that, e.g. "\xE0\xF8" or similar.

Unicode Basics on Windows

I have a C++ library which I deliver to other developers. One of them needs i18n, so he asked me if I could add L prefix to the strings in the API.
I don't know much about i18n so I have some basic questions:
When I compile my lib with Unicode, can other developers use this build as usual ? Or shall developers also change their Visual Studio settings to use unicode ?
When I compile my lib with Unicode, do I need to change all the strings in headers and .cpp files? Or is it sufficient to add L prefix to strings in header files ?
Thanks in advance!
Paul
Adding the L prefix changes the string from a char array into a short array. A better alternative is to wrap all your strings with the "TEXT" macro, i.e.
TEXT("My string")
If your build is a Unicode build, all your strings become an array of shorts, but if not, they remain as an array of chars. Windows also provides the following types:
LPWSTR = short *
LPTSTR = short *, or char * if UNICODE not defined
LPSTR = char *
Don't forget though; even though you've prefixed L or wrapped TEXT to your strings, you need to make sure you're calling the right functions. Standard Windows string API such as lstrlen automatically switch from char * to short * if UNICODE is defined, but you'll need to make sure you're not using functions that only use char *.
Functions that your library exports that use strings will also break older applications that use your library since those applications will still be passing and array of chars rather than shorts, so you'll probably want to work in some sort of backwards compatibility in there.
There's a lot more than Unicode support to internationalization (i18n). Off the top of my head, there is:
Currency
Number representation
Text encodings (partially abstracted by the use of Unicode)
Right-to-left scripts
Text translation mechanisms
Most of this is available in some form or another through APis on Window, whether it be Win32 or .Net, etc. I suggest you take a look at:
Microsoft .Net Internationalization
The Microsoft Win32 Internationalization Checklist

How to use wide string literals in c++ without putting L in front of each one

You'll have to forgive my ignorance, but I'm not used to using wide character sets in c++, but is there a way that I can use wide string literals in c++ without putting an L in front of each literal?
If so, how?
No, there isn't. You have to use the L prefix (or a macro such as _T() with VC++ that expands to L anyway when compiled for Unicode).
The new C++0x Standard defines another way of doing this:
http://en.wikipedia.org/wiki/C%2B%2B0x#New_string_literals
on a related note..
i'm trying to do the following
#define get_switch( m ) myclass::getSwitch(L##m)
which is a macro the will expand
get_switch(isrunning)
into
myclass::getswitch(L"isrunning")
this works fine in c++ visualstudio 2008
but when i compile the same code under mac Xcode (for iphone) i get the error:
error: 'L' was not defined in this scope.
EDIT: Solution
#define get_switch( m ) myclass::getSwitch(L ## #m)
this works on both vc++ and mac xcode (gcc)
Why do you not want to prefix string literals with an L? It's quite simple - strings without an L are ANSI strings (const char*), strings with an L are wide-character strings (const wchar_t*). There is the TEXT() macro, which makes a string literal into an ANSI or a wide-character string depending on of the current project is set to use Uncode:
#ifdef UNICODE
#define TEXT(s) L ## s
#else
#define TEXT(s) s
#endif
There's also the _T() macro, which is equivalent to TEXT().