I am updating an older visual studio project to VS2013 and keep running into an issue where it does not like the parameters that I pass into strcpy functions.
This is a Unicode application.
I get the error -
cannot convert argument 2 from 'CString' to 'const char *'
strcpy(szFileName, m_strFileName);
m_strFileName is defined as a CString.
The strcpy function accepts only parameters of type char*. That's what the compiler error is telling you—you have a type mismatch error. In a Windows environment, char* means narrow (i.e., ANSI) strings. Which no one uses anymore and hasn't for well over a decade.
You know this already; you say that you're building a Unicode application, which is what you should be doing. But that means you can't call the narrow string functions (str*) anymore. You have two options. Either:
Explicitly call the "wide" (i.e., Unicode) variants of the C string library functions, which are prefixed with wcs instead of str. In this case, then, you'd be calling wcscpy.
Use the macros that map automatically to the correct variant of the C string library functions. If the _UNICODE symbol is defined (as it would be for you), they will map to the wide-string variants; otherwise, they map to the narrow-string variants. These functions (actually macros) are all prefixed with _tcs. In this case, then, you'd call _tcscpy.
Related
The wchar_t type is used extensively on Windows API and C++ standard library APIs derived from them therefore it's hard to change Windows code to use something else because you would have to cast/convert back and forth every time.
But on non-Windows wide characters are rarely used and UTF-8 encoding is preferred instead. Therefore having code that uses wchar_t outside Windows probably does something wrong and even if its intended it's better to use types that communicate the intent better eg. using std::u16string and char16_t when dealing with UTF-16 strings instead of wstring and using std::u32string and char32_t when the intent is storing Unicode codepoints.
Is there a GCC option to turn on a diagnostic project wide that warns or errors when it sees a wchar_t, therefore identifying potential sites for refactoring?
That is a little work around and not dedicated to GCC and also will break your build but allows you to find where you use wchar_t. (And also breaks included third-party code more or less)
You can override the definition of wchar_t with the preprocessor which then leads to errors on the usage. In that way you can find the potential usages:
#define wchar_t void
wchar_t Foo() { }
int main()
{
auto wchar_used = Foo();
}
Error message:
error: 'void wchar_used' has incomplete type
10 | auto wchar_used = Foo();
In the past, I used CT2W and CT2A to convert string between Unicode & Ansi. Now It seems that CStringW and CStringA can also do the same task.
I write the following code snippet:
CString str = _T("Abc");
CStringW str1;
CStringA str2;
CT2W str3(str);
CT2A str4(str);
str1 = str;
str2 = str;
It seems CStringW and CStringA also perform conversions by using WideCharToMultibyte when I assign str to them.
So, what is the advantages of using CT2W/CT2A instead of CStringW/CStringA, since I have never heard of the latter pair. Neither MS recommend the latter pair to do the conversion.
CString offers a number of conversion constructors to convert between ANSI and Unicode encoding. They are as convenient as they are dangerous, often masking bugs. MFC allows you to disable implicit conversion by defining the _CSTRING_DISABLE_NARROW_WIDE_CONVERSION preprocessor symbol (which you probably should). Conversions always involve creating of a new CString object with heap-allocated storage (ignoring the short string optimization).
By contrast, the Cs2d macros (where s = source, d = destination) work on raw C-style strings; no CString instances are created in the process of converting between character encodings. A temporary buffer of 128 code units is always allocated on the stack, in addition to a heap-allocated buffer in case the conversion requires more space.
Both of the above perform a conversion with an implied ANSI codepage (either CP_THREAD_ACP or CP_ACP in case the _CONVERSION_DONT_USE_THREAD_LOCALE preprocessor symbol is defined). CP_ACP is particularly troublesome, as it's a process-global setting, that any thread can change at any time.
Which one should you choose for your conversions? Neither of the above. Use the EX versions instead (see string and text classes for a full list). Those are implemented as class templates that give you a lot more control you need to reliably perform your conversions. The template non-type parameter lets you control the static buffer. More importantly, those class templates have constructors with an explicit codepage parameter, so you can perform the conversion you want (including from and to UTF-8), and document your intent in code.
I am check the document for CString . In the following statement:
CStringT( LPCSTR lpsz ): Constructs a Unicode CStringT from an ANSI string. You can also use this constructor to load a string resource as shown in the example below.
CStringT( LPCWSTR lpsz ): Constructs a CStringT from a Unicode string.
CStringT( const unsigned char* psz ): Allows you to construct a CStringT from a pointer to unsigned char.
I have some questions:
Why are there two versions, one for const char* (LPCSTR) and one for unsigned char*? Which version should I use for different cases? For example, does CStringT("Hello") use the first or second version? When getting a null-terminated string from a third-party, such as sqlite3_column_text() (see here), should I convert it to char* or unsigned char *? ie, should I use CString((LPCSTR)sqlite3_column_text(...)) or CString(sqlite3_column_text(...))? It seems that both will work, is that right?
Why does the char* version construct a "Unicode" CStringT but the unsigned char* version will construct a CStringT? CStringT is a templated class to indicate all 3 instances, ie, CString, CStringA, CStringW, so why the emphasis on "Unicode" CStringT when constructing using LPCSTR (const char*)?
LPCSTR is just const char*, not const signed char*. char is signed or unsigned depending on compiler implementation, but char, signed char, and unsigned char are 3 distinct types for purposes of overloading. String literals in C++ are of type const char[], so CStringT("Hello") will always use the LPCSTR constructor, never the unsigned char* constructor.
sqlite3_column_text(...) returns unsigned char* because it returns UTF-8 encoded text. I don't know what the unsigned char* constructor of CStringT actually does (it has something to do with MBCS strings), but the LPCSTR constructor performs a conversion from ANSI to UNICODE using the user's default locale. That would destroy UTF-8 text that contains non-ASCII characters.
Your best option in that case is to convert the UTF-8 text to UTF-16 (using MultiByteToWideChar() or equivalent, or simply using sqlite3_column_text16() instead, which returns UTF-16 encoded text), and then use the LPCWSTR (const wchar_t*) constructor of CStringT, as Windows uses wchar_t for UTF-16 data.
tl;dr: Use either of the following:
CStringW value( sqlite3_column_text16() ); (optionally setting SQLite's internal encoding to UTF-16), or
CStringW value( CA2WEX( sqlite3_column_text(), CP_UTF8 ) );
Everything else is just not going to work out, one way or another.
First things first: CStringT is a class template, parameterized (among others) on the character type it uses to represent the stored sequence. This is passed as the BaseType template type argument. There are 2 concrete template instantiations, CStringA and CStringW, that use char and wchar_t to store the sequence of characters, respectively1.
CStringT exposes the following predefined types that describe the properties of the template instantiation:
XCHAR: Character type used to store the sequence.
YCHAR: Character type that an instance can be converted from/to.
The following table shows the concrete types for CStringA and CStringW:
| XCHAR | YCHAR
---------+---------+--------
CStringA | char | wchar_t
CStringW | wchar_t | char
While the storage of the CStringT instantiations make no restrictions with respect to the character encoding being used, the conversion c'tors and operators are implemented based on the following assumptions:
char represents ANSI2 encoded code units.
whcar_t represents UTF-16 encoded code units.
If your program doesn't match those assumptions, it is strongly advised to disable implicit wide-to-narrow and narrow-to-wide conversions. To do this, defined the _CSTRING_DISABLE_NARROW_WIDE_CONVERSION preprocessor symbol prior to including any ATL/MFC header files. Doing so is recommended even if your program meets the assumptions to prevent accidental conversions, that are both costly as well as potentially destructive.
With that out of the way, let's move on to the questions:
Why are there two versions, one for const char* (LPCSTR) and one for unsigned char*?
That's easy: Convenience. The overload simply allows you to construct a CString instance irrespective of the signedness of the character type3. The implementation of the overload taking a const unsigned char* argument 'forwards' to the c'tor taking a const char*:
CSTRING_EXPLICIT CStringT(_In_z_ const unsigned char* pszSrc) :
CThisSimpleString( StringTraits::GetDefaultManager() )
{
*this = reinterpret_cast< const char* >( pszSrc );
}
Which version should I use for different cases?
It doesn't matter, as long as you are constructing a CStringA, i.e. no conversion is applied. If you are constructing a CStringW, you shouldn't be using either of those (as explained above).
For example, does CStringT("Hello") use the first or second version?
"Hello" is of type const char[6], that decays into a const char* to the first element in the array, when passed to the CString c'tor. It calls the overload taking a const char* argument.
When getting a null-terminated string from a third-party, such as sqlite3_column_text() (see here), should I convert it to char* or unsigned char *? ie, should I use CString((LPCSTR)sqlite3_column_text(...)) or CString(sqlite3_column_text(...))?
SQLite assumes UTF-8 encoding in this case. CStringA can store UTF-8 encoded text, but it's really, really dangerous to do so. CStringA assumes ANSI encoding, and readers of your code likely will do, too. It is recommended to either change your SQLite database to store UTF-16 (and use sqlite_column_text16) to construct a CStringW. If that is not feasible, manually convert from UTF-8 to UTF-16 before storing the data in a CStringW instance using the CA2WEX macro:
CStringW data( CA2WEX( sqlite3_column_text(), CP_UTF8 ) );
It seems that both will work, is that right?
That's not correct. Neither one works as soon as you get non-ASCII characters from your database.
Why does the char* version construct a "Unicode" CStringT but the unsigned char* version will construct a CStringT?
That looks to be the result of documentation trying to be compact. A CStringT is a class template. It is neither Unicode nor does it even exist. I'm guessing that remark section on the constructors is meant to highlight the ability to construct Unicode strings from ANSI input (and vice versa). This is briefly mentioned, too ("Note that some of these constructors act as conversion functions.").
To sum this up, here is a list of generic advice when using MFC/ATL strings:
Prefer using CStringW. This is the only string type whose implied character encoding is unambiguous (UTF-16).
Use CStringA only, when interfacing with legacy code. Make sure to unambiguously note the character encoding used. Also make sure to understand that "currently active locale" can change at any time. See Keep your eye on the code page: Is this string CP_ACP or UTF-8? for more information.
Never use CString. Just by looking at code, it's no longer clear, what type this is (could be any of 2 types). Likewise, when looking at a constructor invocation, it is no longer possible to see, whether this is a copy or conversion operation.
Disable implicit conversions for the CStringT class template instantiations.
1 There's also CString that uses the generic-text mapping TCHAR as its BaseType. TCHAR expands to either char or wchar_t, depending preprocessor symbols. CString is thus an alias for either CStringA or CStringW depending on those very same preprocessor symbols. Unless you are targeting Win9x, don't use any of the generic-text mappings.
2 Unlike Unicode encodings, ANSI is not a self-contained representation. Interpretation of code units depends on external state (the currently active locale). Do not use unless you are interfacing with legacy code.
3 It is implementation defined, whether char is interpreted as signed or unsigned. Either way, char, unsigned char, and signed char are 3 distinct types. By default, Visual Studio interprets char as signed.
I search over internet for about 2 hours and I don't find any work solution.
My program have multibyte character set, in code i got:
WCHAR value[1];
_tcslen(value);
And in compiling, I got error:
'strlen' : cannot convert parameter 1
from 'WCHAR [1]' to 'const char *'
How to convert this WCHAR[1] to const char * ?
I assume the not-very-useful 1-length WCHAR array is just an example...
_tcslen isn't a function; it's a macro that expands to strlen or wcslen according to the _UNICODE define. If you're using the multibyte character set (which means _UNICODE ISN'T defined) and you have a WCHAR array, you'll have to use wcslen explicitly.
(Unless you're using TCHAR specifically, you probably want to avoid the _t functions. There's a not-very-good overview in MSDN at http://msdn.microsoft.com/en-us/library/szdfzttz(VS.80).aspx; does anybody know of anything better?)
Try setting "use unicode" in your projects settings in VS if you want _tcslen to be wcslen, set it to "use multibyte" for _tcslen to be strlen. As some already pointed out, _t prefixed functions (as many others actually, e.g. MessageBox()) are macros that are "expanded" based on the _UNICODE precompiler define value.
Another option would be to use TCHAR instead of WCHAR in your code. Although I'd say a better idea would be to just stick to either wchar_t or char types and use appropriate functions with them.
And last but not least, as the question is tagged c++, consider using std::string (or std::wstring for that matter) and std::vector instead of char buffers. Using those with C API functions is trivial and generally a lot safer.
use conversion function WideCharToMutliByte() .
http://msdn.microsoft.com/en-us/library/dd374130(VS.85).aspx
_tcslen takes WCHAR pointers or arrays as input only when UNICODE is #defined in your enviroment.
From the error message, I'd say that it isn't
The best way to define unicode is to pass it as a parameter to the compiler. For Microsoft C, you would use the /D switch
cl -c /DUNICODE myfile.cpp
or you could change your array declaration to TCHAR, which like _tcslen will be char when UNICODE is not #defined and WCHAR when it is.
Why I get the following warning for the following code :)
Code:
_stprintf(m_szFileNamePath,_T("%s"),strFileName);
warning C4996: '_swprintf': swprintf has been changed to conform with the ISO C standard, adding an extra character count parameter. To use traditional Microsoft swprintf, set _CRT_NON_CONFORMING_SWPRINTFS.
I know _strprintf is a macro which if _UNICODE is defined will evaluate to _swprintf else it will be sprintf.
Now what is this _swprintf. There is a function swprintf, but why is _stprintf evaluating to _swprintf instead of swprintf.
What is the difference b/w the _xxx and xxx functions?
EDIT:
Okay there are two definitions for the UNICODE version of _stprintf, which one is included?
The one in tchar.h or strsafe.h?
http://msdn.microsoft.com/en-us/library/ybk95axf%28VS.80%29.aspx
swprintf is a wide-character version of sprintf; the pointer arguments to swprintf are wide-character strings. Detection of encoding errors in swprintf may differ from that in sprintf. swprintf and fwprintf behave identically except that swprintf writes output to a string rather than to a destination of type FILE, and swprintf requires the count parameter to specify the maximum number of characters to be written. The versions of these functions with the _l suffix are identical except that they use the locale parameter passed in instead of the current thread locale.
In Visual C++ 2005, swprintf conforms to the ISO C Standard, which requires the second parameter, count, of type size_t. To force the old nonstandard behavior, define _CRT_NON_CONFORMING_SWPRINTFS. In a future version, the old behavior may be removed, so code should be changed to use the new conformant behavior.
Maybe this?
_stprintf(m_szFileNamePath, 256, _T("%s"), strFileName);
Microsoft provides its own extension of CRT _swprintf - is not compatible (for example) with unix
Microsoft (used to?) prefix otherwise widely available non-win32 functions that were not part of the C standard with underscore.
This should work
int len = swprintf( buf, 100, L"%s", L"Hello world" );