_T( ) macro changes for UNICODE character data - c++

I have UNICODE application where in we use _T(x) which is defined as follows.
#if defined(_UNICODE)
#define _T(x) L ##x
#else
#define _T(x) x
#endif
I understand that L gets defined to wchar_t, which will be 4 bytes on any platform. Please correct me if I am wrong. My requirement is that I need L to be 2 bytes. So as compiler hack I started using -fshort-wchar gcc flag. But now I need my application to be moved to zSeries where I don't get to see the effect of -fshort-wchar flag in that platform.
In order for me to be able to port my application on zSeries, I need to modify _T( ) macro in such a way that even after using L ##x and without using -fshort-wchar flag, I need to get 2byte wide character data.Can some one tell me how I can change the definition of L so that I can define L to be 2 bytes always in my application.

You can't - not without c++0x support. c++0x defines the following ways of declaring string literals:
"string of char characters in some implementation defined encoding" - char
u8"String of utf8 chars" - char
u"string of utf16 chars" - char16_t
U"string of utf32 chars" - char32_t
L"string of wchar_t in some implementation defined encoding" - wchar_t
Until c++0x is widely supported, the only way to encode a utf-16 string in a cross platform way is to break it up into bits:
// make a char16_t type to stand in until msvc/gcc/etc supports
// c++0x utf string literals
#ifndef CHAR16_T_DEFINED
#define CHAR16_T_DEFINED
typedef unsigned short char16_t;
#endif
const char16_t strABC[] = { 'a', 'b', 'c', '\0' };
// the same declaration would work for a type that changes from 8 to 16 bits:
#ifdef _UNICODE
typedef char16_t TCHAR;
#else
typedef char TCHAR;
#endif
const TCHAR strABC2[] = { 'a', 'b', 'b', '\0' };
The _T macro can only deliver the goods on platforms where wchar_t's are 16bits wide. And, the alternative is still not truly cross-platform: The coding of char and wchar_t is implementation defined so 'a' does not necessarily encode the unicode codepoint for 'a' (0x61). Thus, to be strictly accurate, this is the only way of writing the string:
const TCHAR strABC[] = { '\x61', '\x62', '\x63', '\0' };
Which is just horrible.

Ah! The wonders of portability :-)
If you have a C99 compiler for all your platforms, use int_least16_t, uint_least16_t, ... from <stdint.h>. Most platforms also define int16_t but it's not required to exist (if the platform is capable of using exactly 16 bits at a time, the typedef int16_t must be defined).
Now wrap all the strings in arrays of uint_least16_t and make sure your code does not expect values of uint_least16_t to wrap at 65535 ...

Related

How to port Windows C++ that handles unicode with tchar.h to iOS app

I have some c++ code that I need to integrate into an iOS app. The windows c++ code handles unicode using tchar.h. I have made the following defines for iOS:
#include <wchar.h>
#define _T(x) x
#define TCHAR char
#define _tremove unlink
#define _stprintf sprintf
#define _sntprintf vsnprintf
#define _tcsncpy wcsncpy
#define _tcscpy wcscpy
#define _tcscmp wcscmp
#define _tcsrchr wcsrchr
#define _tfopen fopen
When trying to build the app many of these are either missing (ex. wcscpy) or have the wrong arguments. The coder responsible for the c++ code said I should use char instead of wchar so I defined TCHAR as char. Does anyone have a clue as to how this should be done?
The purpose of TCHAR (and FYI, it is _TCHAR in the C runtime, TCHAR is for the Win32 API) is to allow code to switch between either char or wchar_t APIs at compile time, but your defines are mixing them together. The wcs functions are for wchar_t, so you need to change those defines to the char counterparts, to match your other char-based defines:
#define _tcsncpy strncpy
#define _tcscpy strcpy
#define _tcscmp strcmp
#define _tcsrchr strrchr
Also, you are mapping _sntprintf to the wrong C function. It needs to be mapped to snprintf() instead of vsnprintf():
#define _sntprintf snprintf
snprintf() and vsnprintf() are declared very differently:
int snprintf ( char * s, size_t n, const char * format, ... );
int vsnprintf (char * s, size_t n, const char * format, va_list arg );
Which is likely why you are getting "wrong arguments" errors.

How to get %AppData% path as std::string?

I've read that one can use SHGetSpecialFolderPath(); to get the AppData path. However, it returns a TCHAR array. I need to have an std::string.
How can it be converted to an std::string?
Update
I've read that it is possible to use getenv("APPDATA"), but that it is not available in Windows XP. I want to support Windows XP - Windows 10.
The T type means that SHGetSpecialFolderPath is a pair of functions:
SHGetSpecialFolderPathA for Windows ANSI encoded char based text, and
SHGetSpecialFolderPathW for UTF-16 encoded wchar_t based text, Windows' “Unicode”.
The ANSI variant is just a wrapper for the Unicode variant, and it can not logically produce a correct path in all cases.
But this is what you need to use for char based data.
An alternative is to use the wide variant of the function, and use whatever machinery that you're comfortable with to convert the wide text result to a byte-oriented char based encoding of your choice, e.g. UTF-8.
Note that UTF-8 strings can't be used directly to open files etc. via the Windows API, so this approach involves even more conversion just to use the string.
However, I recommend switching over to wide text, in Windows.
For this, define the macro symbol UNICODE before including <windows.h>.
That's also the default in a Visual Studio project.
https://msdn.microsoft.com/en-gb/library/windows/desktop/dd374131%28v=vs.85%29.aspx
#ifdef UNICODE
typedef wchar_t TCHAR;
#else
typedef unsigned char TCHAR;
#endif
Basically you can can convert this array to std::wstring. Converting to std::string is straightforward with std::wstring_convert.
http://en.cppreference.com/w/cpp/locale/wstring_convert
You should use SHGetSpecialFolderPathA() to have the function deal with ANSI characters explicitly.
Then, just convert the array of char to std::string as usual.
/* to have MinGW declare SHGetSpecialFolderPathA() */
#if !defined(_WIN32_IE) || _WIN32_IE < 0x0400
#undef _WIN32_IE
#define _WIN32_IE 0x0400
#endif
#include <shlobj.h>
#include <string>
std::string getPath(int csidl) {
char out[MAX_PATH];
if (SHGetSpecialFolderPathA(NULL, out, csidl, 0)) {
return out;
} else {
return "";
}
}
Typedef String as either std::string or std::wstring depending on your compilation configuration. The following code might be useful:
#ifndef UNICODE
typedef std::string String;
#else
typedef std::wstring String;
#endif

Xerces-c and cross-platform string literals

I'm porting a code-base that uses Xerces-c for XML processing from Windows/VC++ to Linux/G++.
On Windows, Xerces-c uses wchar_t as the character type XmlCh. This has allowed people to use std::wstring and string literals of L"" syntax.
On Linux/G++, wchar_t is 32-bit and Xerces-c uses unsigned short int (16-bit) as the character type XmlCh.
I've started out along this track:
#ifdef _MSC_VER
using u16char_t = wchar_t;
using u16string_t = std::wstring;
#elif defined __linux
using u16char_t = char16_t;
using u16string_t = std::u16string;
#endif
Unfortunately, char16_t and unsigned short int are not equivalent and their pointers are not implicitly convertible. So passing u"Hello, world." to Xerces functions still results in invalid conversion errors.
It's starting to look like I'm going to have to explicitly cast every string I pass to Xerces functions. But before I do, I wanted to ask if anyone knows a saner way to programme cross-platform Xerces-c code.
The answer is that no, no-one has a good idea on how to do this. For anyone else who finds this question, this is what I came up with:
#ifdef _MSC_VER
#define U16S(x) L##x
#define U16XS(x) L##x
#define XS(x) x
#define US(x) x
#elif defined __linux
#define U16S(x) u##x
#define U16XS(x) reinterpret_cast<const unsigned short *>(u##x)
inline unsigned short *XS(char16_t* x) {
return reinterpret_cast<unsigned short *>(x);
}
inline const unsigned short *XS(const char16_t* x) {
return reinterpret_cast<const unsigned short *>(x);
}
inline char16_t* US(unsigned short *x) {
return reinterpret_cast<char16_t *>(x);
}
inline const char16_t* US(const unsigned short *x) {
return reinterpret_cast<const char16_t*>(x);
}
#include "char16_t_facets.hpp"
#endif
namespace SafeStrings {
#if defined _MSC_VER
using u16char_t = wchar_t;
using u16string_t = std::wstring;
using u16sstream_t = std::wstringstream;
using u16ostream_t = std::wostream;
using u16istream_t = std::wistream;
using u16ofstream_t = std::wofstream;
using u16ifstream_t = std::wifstream;
using filename_t = std::wstring;
#elif defined __linux
using u16char_t = char16_t;
using u16string_t = std::basic_string<char16_t>;
using u16sstream_t = std::basic_stringstream<char16_t>;
using u16ostream_t = std::basic_ostream<char16_t>;
using u16istream_t = std::basic_istream<char16_t>;
using u16ofstream_t = std::basic_ofstream<char16_t>;
using u16ifstream_t = std::basic_ifstream<char16_t>;
using filename_t = std::string;
#endif
char16_t_facets.hpp has definitions of the template specialisations std::ctype<char16_t>, std::numpunct<char16_t>, std::codecvt<char16_t, char, std::mbstate_t>. It's necessary to add these to the global locale, along with std::num_get<char16_t> and std::num_put<char16_t> (but it's not necessary to provide specialisations for these). The code for codecvt is the only bit that's difficult, and a reasonable template can be found in the GCC 5.0 libraries (if you use GCC 5, you don't need to provide the codecvt specialisation as it's already in the library).
Once you've done all of that, the char16_t streams will work correctly.
Then, every time you define a wide string, instead of L"string", write U16S("string"). Every time you pass a string to Xerces, write XS(string.c_str()) or U16XS("string") for literals. Every time you get a string back from Xerces, convert it back as u16string_t(US(call_xerces_function())).
Note that it is also possible to recompile Xerces-C with the character type set to char16_t. This removes a lot of the effort required above. BUT you won't be able to use any other library on the system that in turn depends on Xerces-C. Linking to any such library will cause link errors (because changing the character type changes many of the Xerces function signatures).

error C2664: 'CComboBox::InsertString' : cannot convert parameter 2 from 'const char [4]' to 'LPCTSTR'

I am trying to do the following:
class sig
{
CComboBox objList;
void SetDefault();
}
void sig :: SetDefault()
{
objList.InsertString(0, METHOD_ONE);
}
I have defined METHOD_ONE in a different class as
#define METHOD_ONE "OFF"
And I get the above error.
Can somebody please help me?
Cheers,
Chintan
The most important part is to understand the error; know what is a const char [4], is the easy part but, what about the LPCTSTR?
According to the Microsoft documentation:
An LPCWSTR if UNICODE is defined, an LPCSTR otherwise. For more
information, see Windows Data Types for Strings.
And the LPCWSTR is:
A pointer to a constant null-terminated string of 16-bit Unicode characters. For more information, see Character Sets Used By Fonts.
First, you must check out what type of encoding are using your program; it seems that you're using UNICODE, so in the end you're trying to convert a const pointer to chars (the "OFF" constant) to a const pointer to wchar_t, and (logically) the conversion isn't allowed.
Then, you can choose the correct string type; if UNICODE is defined, your #define must be wide string:
// Note the L
#define METHOD_ONE L"OFF"
You can also define it this way:
#ifdef UNICODE
#define METHOD_ONE L"OFF"
#else
#define METHOD_ONE "OFF"
#endif
Or use the _T macro suggested by Roman R. The only that this macro does is to append L prefix to the text:
#ifdef UNICODE
#define _T(x) L ##x
#else
#define _T(x) x
#endif
In the end, you must be aware of what kind of string are using; but Microsoft is hidding it by using an obscure chain of #defines and typedefs.

What is difference between TCHAR and WCHAR?

I've opened winnt.h header file and found there this two lines:
typedef wchar_t WCHAR;
and
typedef WCHAR TCHAR, *PTCHAR;
but there was comment in one of my posts that there is some difference between them. Then what is the difference?
If you read the entire header, you will find:
#ifdef _UNICODE
typedef WCHAR TCHAR;
#else
typedef char TCHAR;
#endif
or words to that effect.
Perhaps MS has removed the narrow option of late.
TCHAR can be either char or WCHAR based on the platform. WCHAR is always a 16-bit Unicode character, wchar_t.
http://msdn.microsoft.com/en-us/library/aa383751%28VS.85%29.aspx
TCHAR:
A WCHAR if UNICODE is defined, a CHAR otherwise.
WCHAR:
A 16-bit Unicode character. For more information, see Character Sets Used By Fonts.
Technically speaking there is no difference because you cannot typedef two different entities to a single one. Let us See An Example...
typedef char a;
typedef char b;
typedef a b, c;
This Definition Works But If a Change The Above Definition To This
typedef char a;
typedef char * b;
typedef a b, c;
Error 1 error C2040: 'b' : 'a' differs in levels of indirection from 'char *'
Another One
typedef char a;
typedef int b;
typedef a b, c;
Error 1 error C2371: 'b' : redefinition; different basic types
So By Analyzing These Things Only Same Type Can Defined Together.
TCHAR is portable type which is char for ANSI type projects and WCHAR (16-bit Unicode char) for UNICODE projects. Using TCHAR and TCHAR */LPSTR you can create portable project which can easily be recompiled for ANSI and UNICODE version. But after Windows 98/ME become obsolete and rarely used, there is no need to create non-Unicode executables.