The 'L' and 'LPCWSTR' in WIndows API - c++

I've found that
NetUserChangePassword(0, 0, L"ab", L"cd");
changes the user password from ab to cd. However,
NetUserChangePassword(0, 0, (LPCWSTR) "ab", (LPCWSTR) "cd");
doesn't work. The returned value indicates invalid password.
I need to pass const char* as last two parameters for this function call. How can I do that? For example,
NetUserChangePassword(0, 0, (LPCWSTR) vs[0].c_str(), (LPCWSTR) vs[1].c_str());
Where vs is std::vector<std::string>.

Those are two totally different L's. The first is a part of the C++ language syntax. Prefix a string literal with L and it becomes a wide string literal; instead of an array of char, you get an array of wchar_t.
The L in LPCWSTR doesn't describe the width of the characters, though. Instead, it describes the size of the pointer. Or, at least, it used to. The L abbreviation on type names is a relic of 16-bit Windows, when there were two kinds of pointers. There were near pointers, where the address was somewhere within the current 64 KB segment, and there were far, or long pointers, which could point beyond the current segment. The OS required callers to provide the latter to its APIs, so all the pointer-type names use LP. Nowadays, there's only one type of pointer; Microsoft keeps the same type names so that old code continues to compile.
The part of LPCWSTR that specifies wide characters is the W. But merely type-casting a char string literal to LPCWSTR is not sufficient to transform those characters into wide characters. Instead, what happens is the type-cast tells the compiler that what you wrote really is a pointer to a wide string, even though it really isn't. The compiler trusts you. Don't type-cast unless you really know better than the compiler what the real types are.
If you really need to pass a const char*, then you don't need to type-cast anything, and you don't need any L prefix. A plain old string literal is sufficient. (If you really want to cast to a Windows type, use LPCSTR — no W.) But it looks like what you really need to pass in is a const wchar_t*. As we learned above, you can get that with the L prefix on the string literal.
In a real program, you probably don't have a string literal. The user will provide a password, or you'll read a password from some other external source. Ideally, you would store that password in a std::wstring, which is like std::string but for wchar_t instead of char. The c_str() method of that type returns a const wchar_t*. If you don't have a wstring, a plain array of wchar_t might be sufficient.
But if you're storing the password in a std::string, then you'll need to convert it into wide characters some other way. To do a conversion, you need to know what code page the std::string characters use. The "current ANSI code page" is usually a safe bet; it's represented by the constant CP_ACP. You'll use that when calling MultiByteToWideString to have the OS convert from the password's code page into Unicode.
int required_size = MultiByteToWideChar(CP_ACP, 0, vs[0].c_str(), vs[0].size(), NULL, 0);
if (required_size == 0)
ERROR;
// We'll be storing the Unicode password in this vector. Reserve at
// least enough space for all the characters plus a null character
// at the end.
std::vector<wchar_t> wv(required_size);
int result = MultiByteToWideChar(CP_ACP, 0, vs[0].c_str(), vs[0].size(), &wv[0], required_size);
if (result != required_size - 1)
ERROR;
Now, when you need a wchar_t*, just use a pointer to the first element of that vector: &wv[0]. If you need it in a wstring, you can construct it from the vector in a few ways:
// The vector is null-terminated, so use "const wchar_t*" constructor
std::wstring ws1 = &wv[0];
// Use iterator constructor. The vector is null-terminated, so omit
// the final character from the iterator range.
std::wstring ws2(wv.begin(), wv.end() - 1);
// Use pointer/length constructor.
std::wstring ws3(&wv[0], wv.size() - 1);

You have two problems.
The first is the practical problem - how to do this. You are confusing wide and narrow strings and casting from one to the other. A string with an L prefix is a wide string, where each character is two bytes (a wchar_t). A string without the L is a single byte (a char). You cannot cast from one to the other using the C-style cast (LPCWSTR) "ab" because you have an array of chars, and are casting it to a pointer to wide chars. It is simply changing the pointer type, not the underlying data.
To convert from a narrow string to a wide string, you would normally use MultiByteToWideChar. You don't mention what code page your narrow strings are in; you would probably pass in CP_ACP for the first parameter. However, since you are converting between a string and a wstring, you might be interested in other ways to convert (one, two). This will give you a wstring with your characters, not a string, and a wstring's .c_str() method returns a pointer to wchar_ts.
The second is the following misunderstanding:
I need to pass const char* as last two parameters for this function call. How can I do that?
No you don't. You need to pass a wide string, which you got above. Your approach to this (casting the pointer) indicates you probably don't know about different string types and character encodings, and this is something every software developer should know. So, on the assumption you're interested, hopefully you'll find the following references handy:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
The Unicode FAQ (covers things like 'What is Unicode?')
MSDN introduction to wide characters.
I'd recommend you investigate recompiling your application with UNICODE and using wide strings. Many APIs are defined in both narrow and wide versions, and normally this would mean you access the narrow version by default (you can access either the ANSI (narrow) or Wide versions of these APIs by directly calling the A or W version - they have A or W appended to their name, such as CreateWindowW - see the bottom of that page for the two names. You normally don't need to worry about this.) As far as I can tell, this API is always available as-is regardless of UNICODE, it's just it's only prototyped as wide.

C style casts as you've used here are a very blunt instrument. They assume you know exactly what you're doing.
You'll need to convert your ASCII or multi-byte strings into Unicode strings for the API. There might be a NetUserChangePasswordA function that takes the char * types you're trying to pass, try that first.

LPWSTR is defined wchar_t* (whcar_T is 2 byte char-type) which are interpreted differently then normal 1 byte chars.

LPCWSTR means you need to pass a wchar_t*.
if you change vs to std::vector<std::wstring> that will give you a wide char when you pass vs[0].c_str()
if you look at the example at http://msdn.microsoft.com/en-us/library/aa370650(v=vs.85).aspx you can see that they define UNICODE which is why they use the wchar_t.

Related

__int64 to CString returns wrong values - C++ MFC

I want to convert a __int64 variable into a CString. The code is exactly this
__int64 i64TotalGB;
CString totalSpace;
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(_T("%I64d", i64TotalGB));
printf("totalSpace contains: %s", totalSpace);
the first printf prints
"disk space: 150GB"
which it's correct, but the second printf prints randomly high numbers like
"totalSpace contains: 298070026817519929"
I also tried to use a INT64 variable instead of a __int64 variable, but the result is the same. What can be the cause of this?
Here:
totalSpace.Format(_T("%I64d", i64TotalGB));
you're passing i64TotalGB as an argument to the _T() macro instead of passing it as the second argument to Format().
Try this:
totalSpace.Format(_T("%I64d"), i64TotalGB);
Having said that, thanks to MS's mess (ha) around character encodings, using _T here is not the right thing, as CString is composed of a TCHAR and not _TCHAR. So taking that into account, might as well use TEXT() instead of T(), as it is dependent on UNICODE and not _UNICODE:
totalSpace.Format(TEXT("%I64d"), i64TotalGB);
In addition, this line is wrong as it tries to pass an ATL CString as a char* (a.k.a. C-style string):
printf("totalSpace contains: %s", totalSpace);
For which the compiler gives this warning:
warning C4477: 'printf' : format string '%s' requires an argument of type 'char *', but variadic argument 1 has type 'ATL::CString'
While the structure of CString is practically compatible with passing it like you have, this is still formally undefined behavior. Use CString::GetString() to safeguard against it:
printf("totalSpace contains: %ls", totalSpace.GetString());
Note the %ls as under my configuration totalSpace.GetString() returned a const wchar_t*. However, as "printf does not currently support output into a UNICODE stream.", the correct version for this line, that will support characters outside your current code page, is a call to wprintf() in the following manner:
wprintf("totalSpace contains: %s", totalSpace.GetString());
Having said ALL that, here's a general advice, regardless of the direct problem behind the question. The far better practice nowadays is slightly different altogether, and I quote from the respectable answer by #IInspectable, saying that "generic-text mappings were relevant 2 decades ago".
What's the alternative? In the absence of good enough reason, try sticking explicitly to CStringW (A Unicode character type string with CRT support). Prefer the L character literal over the archaic data/text mappings that depend on whether the constant _UNICODE or _MBCS has been defined in your program. Conversely, the better practice would be using the wide-character versions of all API and language library calls, such as wprintf() instead of printf().
The bug is a result of numerous issues with the code, specifically these 2:
totalSpace.Format(_T("%I64d", i64TotalGB));
This uses the _T macro in a way it's not meant to be used. It should wrap a single character string literal. In the code it wraps a second argument.
printf("totalSpace contains: %s", totalSpace);
This assumes an ANSI-encoded character string, but passes a CString object, that can store both ANSI as well as Unicode encoded strings.
The recommended course of action is to drop generic-text mappings altogether, in favor of using Unicode (that's UTF-16LE on Windows) throughout1. The generic-text mappings were relevant 2 decades ago, to ease porting of Win9x code to the Windows NT based products.
To do this
Choose CStringW over CString.
Drop all occurences of _T, TEXT, and _TEXT, and replace them with an L prefix.
Use the wide-character version of the Windows API, CRT, and C++ Standard Library.
The fixed code looks like this:
__int64 i64TotalGB;
CStringW totalSpace; // Use wide-character string
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(L"%I64d", i64TotalGB); // Use wide-character string literal
wprintf(L"totalSpace contains: %s", totalSpace.GetString()); // Use wide-character library
On an unrelated note, while it is technically safe to pass a CString object in place of a character pointer in a variable argument list, this is an implementation detail, and not formally documented to work. Call CString::GetString() if you care about correct code.
1 Unless there is a justifiable reason to use a character encoding that uses char as its underlying type (like UTF-8 or ANSI). In that case you should still be explicit about it by using CStringA.
try this
totalSpace.Format(_T("%I64d"), i64TotalGB);

Why are so many string types in C++?

I like C, I have a C book called Full and Full C and 2 C++, I find these languages fantastic because of their incredible power and performance, but I have to end many of my projects because of these various types.
I say to have std::string, LPCSTR, System::String, TCHAR [], char s [] = "ss", char * s?
This causes tremendous headaches mainly in GUI applications, WinAPI has the problem of LPCSTR not being compatible with char or std::string, and now in CLR applications if it has System::String that gives a lot of headache to convert to std::String or char * s or even char s [].
Why don’t C/C++ have its string type unique like String in Java?
There are no "many types of string in c++". Canonically there is one template std::basic_string, which is basically a container specialized for strings of different character types.
std::string is a convenience typedef onto std::basic_string<char>. There are more such typedefs for different underlying character types.
AFAIK, standard c has also only one officially recognized string standard. It's ANSI-string i.e. a null terminated array of char.
All other you mention are either equivalent of this (e.g. LPCSTR is a long pointer to a constant string i.e. const char*), or some non-standard extensions written by library providers.
Your question is like asking why there are so many GUI libraries. Because there is no standard way to do this, or standard way is lacking in some way, and it was a design decision to provide and support own equivalent type.
Bottom line is, that on the library level, or language level, it's a design decision between different trade-offs. Simplicity, performance, character support, etc. etc. In general, storing text is hard.
Well, first we must answer the question: What is a string?
The C-standard defines it as a contiguous sequence of characters terminated by and including the first null character.1
It also mentions varieties using wchar_t, char16_t, or char32_t instead of char.
It also provides many functions for string-manipulation, and string-literals for notational convenience.
So, a sequence of characters can be a string, a char[] might hold a string, and a char* might point to one.
LPCSTR is a windows typedef for const char* with the added semantics that it should point to a string or be NULL.
TCHAR is one of a number of preprocessor-defines used for transitioning windows code from char to wchar_t. Depending on what TCHAR is, a TCHAR[] might be able to hold a string, or a wide-string.
C++ mixes up things a bit because it adds a data-type for handling strings. To reduce ambiguity, string is only used for the abstract concept, you have to rely on the context to disambiguate or be more explicit.
So the C string corresponds with the C++ null-terminated-byte-string, or NTBS.2
Yes, C++ also knows their wide varieties.
And C++ incorporates the C functions and adds some more.
In addition, C++ has std::basic_string<> for storing all kinds of counted strings, and some convenience-typedefs like std::string.
And now we get to the third language yet, namely C++/CLI.
Which incorporates all I spoke above from C++, and adds the CLI type System::String into the mix.
System::String is an immutable UTF-16 counted-string.
Now to answer the question why C++ does not define one single concrete type to be a string can be answered:
There are different types of string in C++ for interoperability, history, efficiency and convenience. Always use the right tool for the job.
Java and .Net do the same with byte-arrays, char-arrays, string-builders and the like.
Reference 1: C11 final draft, definition of string:
7. Library
7.1 Introduction
7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
Reference 2: C++1z draft n4659 NTBS:
20.4.2.1.5.1 Byte strings [byte.strings]
1 A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other element in the sequence has the value zero.163
2 The length of an NTBS is the number of elements that precede the terminating null character. An empty ntbs has a length of zero.
3 The value of an NTBS is the sequence of values of the elements up to and including the terminating null character.
4 A static NTBS is an NTBS with static storage duration.164
Each string type has his own target, the std::string is for standard library and this is the most common.
As you say C++ has power and performance and these strings allow more flexibility. Char[] and char* you can use for a more generic use of string.

How to tell if LPCWSTR text is numeric?

Entire string needs to be made of integers which as we know are 0123456789 I am trying with following function but it doesnt seem to work
bool isNumeric( const char* pszInput, int nNumberBase )
{
string base = "0123456789";
string input = pszInput;
return (::strspn(input.substr(0, nNumberBase).c_str(), base.c_str()) == input.length());
}
and the example of using it in code...
isdigit = (isNumeric((char*)text, 11));
It returns true even with text in the string
Presumably the issue is that text is actually LPCWSTR which is const wchar_t*. We have to infer this fact from the question title and the cast that you made.
Now, that cast is a problem. The compiler objected to you passing text. It said that text is not const char*. By casting you have not changed what text is, you simply lied to the compiler. And the compiler took its revenge.
What happens next is that you reinterpret the wide char buffer as being a narrow 8 bit buffer. If your wide char buffer has latin text, encoded as UTF-16, then every other byte will be zero. Hence the reinterpret cast that you do results in isNumeric thinking that the string is only 1 character long.
What you need to do is either:
Start using UTF-16 encoded wchar_t buffers in isNumeric.
Convert from UTF-16 to ANSI before calling isNumeric.
You should think about this carefully. It seems that at present you have a rather unholy mix of ANSI and UTF-16 in your program. You really ought to settle on a standard character encoding an use it consistently throughout. That is tenable internal to your program, but you will encounter external text that could use different encodings. Deal with that by converting at the boundary between your program and the outside world.
Personally I don't understand why you are using C strings at all. Surely you should be using std::wstring or std::string.

Have a PCWSTR and need it to be a WCHAR[]

I am re-writing a C++ method from some code I downloaded. The method originally took a PCWSTR as a parameter and then prompted the user to enter a file name. I modified the method to take two parameters (both PCWSTR) and not to prompt the user. I am already generating the list of files somewhere else. I am attempting to call my new (modified) method with both parameters from my method that iterates the list of files.
The original method prompted the user for input using a StringCBGetsW command. Like this...
HRESULT tst=S_OK; //these are at the top of the method
WCHAR fname[85] = {0}; //these are at the top of the method
tst = StringCbGetsW(fname,sizeof(fname));
The wchar fname gets passed to another iteration method further down. When I look at that method, it says it's a LPCWSTR type; I'm assuming it can take the WCHAR instead.
But what it can't do is take the PCWSTR that the method got handed. My ultimate goal is to try not prompt the user for the file name and to take instead the filename that was iterated earlier in another method.
tl;dr. I have a PCWSTR and it needs to get converted to a WCHAR. I don't know what a WCHAR [] is or how to do anything with it. Including to try to do a printf to see what it is.
PS...I know there are easier ways to move and copy around files, there is a reason I'm attempting to make this work using a program.
First, let's try to make some clarity on some Windows specific types.
WCHAR is a typedef for wchar_t.
On Windows with Microsoft Visual C++, it's a 16-bit character type (that can be used for Unicode UTF-16 strings).
PCWSTR and LPCWSTR are two different names for the same thing: they are basically typedefs for const wchar_t*.
The initial L in LPCWSTR is some legacy prefix that, read with the following P, stands for "long pointer". I've never programmed Windows in the 16-bit era (I started with Windows 95 and Win32), but my understanding is that in 16-bit Windows there were something like near pointers and far, or long pointers. Now we have just one type of pointers, so the L prefix can be omitted.
The P stands for "pointer".
The C stands for "constant".
The W stands for WCHAR/wchar_t, and last but not least, the STR part stands for "string".
So, decoding this kind of "Hungarian Notation", PCWSTR means const wchar_t*.
Basically, it's a pointer to a read-only NUL-terminated wchar_t Unicode UTF-16 string.
Is this information enough for you to solve your problem?
If you have a wchar_t string buffer, and a function that expects a PCWSTR, you can just pass the name of the buffer (corresponding the the address of its first character) to the function:
WCHAR buffer[100];
DoSomething(buffer, ...); // DoSomething(PCWSTR ....)
Sometimes - typically for output string parameters - you may also want to specify the size (i.e. "capacity") of the destination string buffer.
If this size is expressed using a count in characters (in this case, in wchar_ts), the the usual Win32 Hungarian Notation is cch ("count of characters"); else, if you want the size expressed in bytes, then the usual prefix is cb ("count of bytes").
So, if you have a function like StringCchCopy(), then from the Cch part you know the size is expressed in characters (wchar_ts).
Note that you can use _countof() to get the size of a buffer in wchar_ts.
e.g. in the above code snippet, _countof(buffer) == 100, since buffer is made by 100 wchar_ts; instead, sizeof(buffer) == 200, since each wchar_t is 2 bytes == 16 bits in size, so the total buffer size in bytes is 100 [wchar_t] * 2 [bytes/wchar_t] = 200 [bytes].

Difference between unsigned char and char pointers

I'm a bit confused with differences between unsigned char (which is also BYTE in WinAPI) and char pointers.
Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:
CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);
Now, the implementations of ArrayToXXString look the following way:
CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
// Casting from BYTE* -> LPCSTR (const char*).
return CStringA((LPCSTR)copiedArray.GetData());
}
CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
copiedArray.Add('\0');
// Same here.
return CStringW((LPCWSTR)copiedArray.GetData());
}
So, the questions:
Is the C-style cast from BYTE* to LPCSTR (const char*) safe for all possible cases?
Is it really necessary to add double null-termination when converting array data to wide-character string?
The conversion routine CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?
Any way to make all this code easier to understand and to maintain?
The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.
A byte will always be one char in size
sizeof(char) always returns 1
A byte will be at least 8 bits in size
This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*, and char * are guaranteed to be equivalent.
Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.
As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.
Yes, it is always safe. Because they both point to an array of single-byte memory locations.
LPCSTR: Long Pointer to Const (single-byte) String
LPCWSTR : Long Pointer to Const Wide (multi-byte) String
LPCTSTR : Long Pointer to Const context-dependent (single-byte or multi-byte) String
In wide character strings, every single character occupies 2 bytes of memory, and the length of the memory location containing the string must be a multiple of 2. So if you want to add a wide '\0' to the end of a string, you should add two bytes.
Sorry for this part, I do not know ATL and I cannot help you on this part, but actually I see no complexity here, and I think it is easy to maintain. What code do you really want to make easier to understand and maintain?
If the BYTE* behaves like a proper string (i.e. the last BYTE is 0), you can cast a BYTE* to a LPCSTR, yes. Functions working with LPCSTR assume zero-terminated strings.
I think the multiple zeroes are only necessary when dealing with some multibyte character sets. The most common 8-bit encodings (like ordinary Windows Western and also UTF-8) don't require them.
The CString is Microsoft's best attempt at user-friendly strings. For instance, its constructor can handle both char and wchar_t type input, regardless of whether the CString itself is wide or not, so you don't have to worry about the conversion much.
Edit: wait, now I see that they are abusing a BYTE array for storing wide chars in. I couldn't recommend that.
An LPCWSTR is a String with 2 Bytes per character, a "char" is one Byte per character. That means you cannot cast it in C-style, because you have to adjust the memory (add a "0" before each standard-ASCII), and not just read the Data in a different way from the memory (what a C-Cast would do).
So the cast is not so safe i would say.
The Double-Nulltermination: You have always 2 Bytes as one Character, so your "End-of-string" sign must be 2 Bytes long.
To make that code easier to understand look after lexical_cast in Boost (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_lexical_cast.html)
Another way would be using the std::strings (using like std::basic_string; ), and you can perform on String operations.