I am trying to understand the GetBuffer() function. Looks like it returns you the pointer to the CString, which is confirmed in msdn GetBuffer(). However, I don't understand the example shown in the msdn GetBuffer().
LPTSTR p = s.GetBuffer( 10 );
Is there a reason why it's 10 inside? Can anyone show me the output of the example?
The 10 is the minimum buffer length, so if you call GetBuffer() on a CString of, say, 4 characters it will allocate an LPTSTR 10 chars long, in case you want to strcpy a longer string into that buffer (as they do in the example). The 10 in the example is arbitrary, they could just as easily used 6 (five letters in "Hello" plus the terminating null) or any larger number and it would have worked the same.
In general, though, you'll be better off steering clear of GetBuffer() unless you really need to use it.
Related
I'm trying to use the StringCchCat function:
HRESULT X;
LPWSTR _strOutput = new wchar_t[100];
LPCWSTR Y =L"Sample Text";
X = StringCchCat(_strOutput, 100, Y);
But for some reason I keep getting the "E_INVALIDARG One or more arguments are invalid." error from X. _strOutput Is also full of some random characters.
This is actually part of a bigger program. So what I'm trying to do is to concatenated the "sample text" to the empty _strOutput variable. This is inside a loop so it is going to happen multiple times. For this particular example it will be as if I'm assigning the Text "Sample Text" to _strrOutput.
Any Ideas?
If it's part of a loop, a simple *_strOutput = 0; will fix your issue.
If you're instead trying to copy a string, not concatenate it, there's a special function that does this for you: StringCchCopy.
Edit: As an aside, if you're using the TCHAR version of the API (and you are), you should declare your strings as TCHAR arrays (ie LPTSTR instead of LPWSTR, and _T("") instead of L""). This would keep your code at least mildly portable.
String copy/concat functions look for null terminators to know where to copy/concat to. You need to initialize the first element of _strOutput to zero so the buffer is null terminated, then you can copy/concat values to it as needed:
LPWSTR _strOutput = new wchar_t[100];
_strOutput[0] = L'\0`; // <-- add this
X = StringCchCat(_strOutput, 100, Y);
I'm writing this answer to notify you (so you see the red 1 at the top of any Stack Overflow page) because you had the same bug yesterday (in your message box) and I now realize I neglected to say this in my answer yesterday.
Keep in mind that the new[] operator on a built-in type like WCHAR or int does NOT initialize the data at all. The memory you get will have whatever garbage was there before the call to new[], whatever that is. The same happens if you say WCHAR x[100]; as a local variable. You must be careful to initialize data before using it. Compilers are usually good at warning you about this. (I believe C++ objects have their constructors called for each element, so that won't give you an error... unless you forget to initialize something in the class, of course. It's been a while.)
In many cases you'll want everything to be zeroes. The '\0'/L'\0' character is also a zero. The Windows API has a function ZeroMemory() that's a shortcut for filling memory with zeroes:
ZeroMemory(array, size of array in bytes)
So to initialize a WCHAR str[100] you can say
ZeoMemory(str, 100 * sizeof (WCHAR))
where the sizeof (WCHAR) turns 100 WCHARs into its equivalent byte count.
As the other answers say, simply setting the first character of a string to zero will be sufficient for a string. Your choice.
Also just to make sure: have you read the other answers to your other question? They are more geared toward the task you were trying to do (and I'm not at all knowledgeable on the process APIs; I just checked the docs for my answer).
I am re-writing a C++ method from some code I downloaded. The method originally took a PCWSTR as a parameter and then prompted the user to enter a file name. I modified the method to take two parameters (both PCWSTR) and not to prompt the user. I am already generating the list of files somewhere else. I am attempting to call my new (modified) method with both parameters from my method that iterates the list of files.
The original method prompted the user for input using a StringCBGetsW command. Like this...
HRESULT tst=S_OK; //these are at the top of the method
WCHAR fname[85] = {0}; //these are at the top of the method
tst = StringCbGetsW(fname,sizeof(fname));
The wchar fname gets passed to another iteration method further down. When I look at that method, it says it's a LPCWSTR type; I'm assuming it can take the WCHAR instead.
But what it can't do is take the PCWSTR that the method got handed. My ultimate goal is to try not prompt the user for the file name and to take instead the filename that was iterated earlier in another method.
tl;dr. I have a PCWSTR and it needs to get converted to a WCHAR. I don't know what a WCHAR [] is or how to do anything with it. Including to try to do a printf to see what it is.
PS...I know there are easier ways to move and copy around files, there is a reason I'm attempting to make this work using a program.
First, let's try to make some clarity on some Windows specific types.
WCHAR is a typedef for wchar_t.
On Windows with Microsoft Visual C++, it's a 16-bit character type (that can be used for Unicode UTF-16 strings).
PCWSTR and LPCWSTR are two different names for the same thing: they are basically typedefs for const wchar_t*.
The initial L in LPCWSTR is some legacy prefix that, read with the following P, stands for "long pointer". I've never programmed Windows in the 16-bit era (I started with Windows 95 and Win32), but my understanding is that in 16-bit Windows there were something like near pointers and far, or long pointers. Now we have just one type of pointers, so the L prefix can be omitted.
The P stands for "pointer".
The C stands for "constant".
The W stands for WCHAR/wchar_t, and last but not least, the STR part stands for "string".
So, decoding this kind of "Hungarian Notation", PCWSTR means const wchar_t*.
Basically, it's a pointer to a read-only NUL-terminated wchar_t Unicode UTF-16 string.
Is this information enough for you to solve your problem?
If you have a wchar_t string buffer, and a function that expects a PCWSTR, you can just pass the name of the buffer (corresponding the the address of its first character) to the function:
WCHAR buffer[100];
DoSomething(buffer, ...); // DoSomething(PCWSTR ....)
Sometimes - typically for output string parameters - you may also want to specify the size (i.e. "capacity") of the destination string buffer.
If this size is expressed using a count in characters (in this case, in wchar_ts), the the usual Win32 Hungarian Notation is cch ("count of characters"); else, if you want the size expressed in bytes, then the usual prefix is cb ("count of bytes").
So, if you have a function like StringCchCopy(), then from the Cch part you know the size is expressed in characters (wchar_ts).
Note that you can use _countof() to get the size of a buffer in wchar_ts.
e.g. in the above code snippet, _countof(buffer) == 100, since buffer is made by 100 wchar_ts; instead, sizeof(buffer) == 200, since each wchar_t is 2 bytes == 16 bits in size, so the total buffer size in bytes is 100 [wchar_t] * 2 [bytes/wchar_t] = 200 [bytes].
I'm having a problem with comparing 2 char strings that are both the same:
char string[50];
strncpy(string, "StringToCompare", 49);
if( !strcmp("StringToCompare", string) )
//do stuff
else
//the code runs into here even tho both strings are the same...this is what the problem is.
If I use:
strcpy(string, "StringToCompare");
instead of:
strncpy(string, "StringToCompare", 49);
it solves the problem, but I would rather insert the length of the string rather than it getting it itself.
What's going wrong here? How do I solve this problem?
You forgot to put a terminating NUL character to string, so maybe strcmp run over the end. Use this line of code:
string[49] = '\0';
to solve your problem.
You need to set the null terminator manually when using strncpy:
strncpy(string, "StringToCompare", 48);
string[49] = 0;
Lots of apparent guesses in the other answers, but a quick suggestion.
First of all, the code as written should work (and in fact, does work in Visual Studio 2010). The key is in the details of 'strncpy' -- it will not implicity add a null terminating character unless the source length is less than the destination length (which it is in this case). strcpy on the other hand does include the null terminator in all cases, suggesting that your compiler isn't properly handling the strncpy function.
So, if this isn't working on your compiler, you should likely initialize your temporary buffer like this:
char string[50] = {0}; // initializes all the characters to 0
// below should be 50, as that is the number of
// characters available in the string (not 49).
strncpy(string, "StringToCompare", 50);
However, I suspect this is likely just an example, and in the real world your source string is 49 (again, you should pass 50 to strncpy in this case) characters or longer, in which case the NULL terminator is NOT being copied into your temporary string.
I would echo the suggestions in the comments to use std::string if available. It takes care of all of this for you, so you can focus on your implementation rather than these trite details.
The byte count parameter in strncpy tells the function how many bytes to copy, not the length of the character buffer.
So in your case you are asking to copy 49 bytes from your constant string into the buffer, which I don't think is your intent!
However, it doesn't explain why you are getting the anomalous result. What compiler are you using? When I run this code under VS2005 I get the correct behavior.
Note that strncpy() has been deprecated in favor of strncpy_s, which does want the buffer length passed to it:
strncpy_s (string,sizeof(string),"StringToCompare",49)
strcopy and strncpy: in this situation they behave identically!!
So you didn't tell us the truth or the whole picture (eg: the string is at least 49 characters long)
I'm a bit confused with differences between unsigned char (which is also BYTE in WinAPI) and char pointers.
Currently I'm working with some ATL-based legacy code and I see a lot of expressions like the following:
CAtlArray<BYTE> rawContent;
CALL_THE_FUNCTION_WHICH_FILLS_RAW_CONTENT(rawContent);
return ArrayToUnicodeString(rawContent);
// or return ArrayToAnsiString(rawContent);
Now, the implementations of ArrayToXXString look the following way:
CStringA ArrayToAnsiString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
// Casting from BYTE* -> LPCSTR (const char*).
return CStringA((LPCSTR)copiedArray.GetData());
}
CStringW ArrayToUnicodeString(const CAtlArray<BYTE>& array)
{
CAtlArray<BYTE> copiedArray;
copiedArray.Copy(array);
copiedArray.Add('\0');
copiedArray.Add('\0');
// Same here.
return CStringW((LPCWSTR)copiedArray.GetData());
}
So, the questions:
Is the C-style cast from BYTE* to LPCSTR (const char*) safe for all possible cases?
Is it really necessary to add double null-termination when converting array data to wide-character string?
The conversion routine CStringW((LPCWSTR)copiedArray.GetData()) seems invalid to me, is that true?
Any way to make all this code easier to understand and to maintain?
The C standard is kind of weird when it comes to the definition of a byte. You do have a couple of guarantees though.
A byte will always be one char in size
sizeof(char) always returns 1
A byte will be at least 8 bits in size
This definition doesn't mesh well with older platforms where a byte was 6 or 7 bits long, but it does mean BYTE*, and char * are guaranteed to be equivalent.
Multiple nulls are needed at the end of a Unicode string because there are valid Unicode characters that start with a zero (null) byte.
As for making the code easier to read, that is completely a matter of style. This code appears to be written in a style used by a lot of old C Windows code, which has definitely fallen out of favor. There are probably a ton of ways to make it clearer for you, but how to make it clearer has no clear answer.
Yes, it is always safe. Because they both point to an array of single-byte memory locations.
LPCSTR: Long Pointer to Const (single-byte) String
LPCWSTR : Long Pointer to Const Wide (multi-byte) String
LPCTSTR : Long Pointer to Const context-dependent (single-byte or multi-byte) String
In wide character strings, every single character occupies 2 bytes of memory, and the length of the memory location containing the string must be a multiple of 2. So if you want to add a wide '\0' to the end of a string, you should add two bytes.
Sorry for this part, I do not know ATL and I cannot help you on this part, but actually I see no complexity here, and I think it is easy to maintain. What code do you really want to make easier to understand and maintain?
If the BYTE* behaves like a proper string (i.e. the last BYTE is 0), you can cast a BYTE* to a LPCSTR, yes. Functions working with LPCSTR assume zero-terminated strings.
I think the multiple zeroes are only necessary when dealing with some multibyte character sets. The most common 8-bit encodings (like ordinary Windows Western and also UTF-8) don't require them.
The CString is Microsoft's best attempt at user-friendly strings. For instance, its constructor can handle both char and wchar_t type input, regardless of whether the CString itself is wide or not, so you don't have to worry about the conversion much.
Edit: wait, now I see that they are abusing a BYTE array for storing wide chars in. I couldn't recommend that.
An LPCWSTR is a String with 2 Bytes per character, a "char" is one Byte per character. That means you cannot cast it in C-style, because you have to adjust the memory (add a "0" before each standard-ASCII), and not just read the Data in a different way from the memory (what a C-Cast would do).
So the cast is not so safe i would say.
The Double-Nulltermination: You have always 2 Bytes as one Character, so your "End-of-string" sign must be 2 Bytes long.
To make that code easier to understand look after lexical_cast in Boost (http://www.boost.org/doc/libs/1_48_0/doc/html/boost_lexical_cast.html)
Another way would be using the std::strings (using like std::basic_string; ), and you can perform on String operations.
I have a problem with wchar_t* to char* conversion.
I'm getting a wchar_t* string from the FILE_NOTIFY_INFORMATION structure, returned by the ReadDirectoryChangesW WinAPI function, so I assume that string is correct.
Assume that wchar string is "New Text File.txt"
In Visual Studio debugger when hovering on variable in shows "N" and some unknown Chinese letters. Though in watches string is represented correctly.
When I try to convert wchar to char with wcstombs
wcstombs(pfileName, pwfileName, fileInfo.FileNameLength);
it converts just two letters to char* ("Ne") and then generates an error.
Some internal error in wcstombs.c at function _wcstombs_l_helper() at this block:
if (*pwcs > 255) /* validate high byte */
{
errno = EILSEQ;
return (size_t)-1; /* error */
}
It's not thrown up as exception.
What can be the problem?
In order to do what you're trying to do The Right Way, there are several nontrivial things that you need to take into account. I'll do my best to break them down for you here.
Let's start with the definition of the count parameter from the wcstombs() function's documentation on MSDN:
The maximum number of bytes that can be stored in the multibyte output string.
Note that this does NOT say anything about the number of wide characters in the wide character input string. Even though all of the wide characters in your example input string ("New Text File.txt") can be represented as single-byte ASCII characters, we cannot assume that each wide character in the input string will generate exactly one byte in the output string for every possible input string (if this statement confuses you, you should check out Joel's article on Unicode and character sets). So, if you pass wcstombs() the size of the output buffer, how does it know how long the input string is? The documentation states that the input string is expected to be null-terminated, as per the standard C language convention:
If wcstombs encounters the wide-character null character (L'\0') either before or when count occurs, it converts it to an 8-bit 0 and stops.
Though this isn't explicitly stated in the documentation, we can infer that if the input string isn't null-terminated, wcstombs() will keep reading wide characters until it has written count bytes to the output string. So if you're dealing with a wide character string that isn't null-terminated, it isn't enough to just know how long the input string is; you would have to somehow know exactly how many bytes the output string would need to be (which is impossible to determine without doing the conversion) and pass that as the count parameter to make wcstombs() do what you want it to do.
Why am I focusing so much on this null-termination issue? Because the FILE_NOTIFY_INFORMATION structure's documentation on MSDN has this to say about its FileName field:
A variable-length field that contains the file name relative to the directory handle. The file name is in the Unicode character format and is not null-terminated.
The fact that the FileName field isn't null-terminated explains why it has a bunch of "unknown Chinese letters" at the end of it when you look at it in the debugger. The FILE_NOTIFY_INFORMATION structure's documentation also contains another nugget of wisdom regarding the FileNameLength field:
The size of the file name portion of the record, in bytes.
Note that this says bytes, not characters. Therefore, even if you wanted to assume that each wide character in the input string will generate exactly one byte in the output string, you shouldn't be passing fileInfo.FileNameLength for count; you should be passing fileInfo.FileNameLength / sizeof(WCHAR) (or use a null-terminated input string, of course). Putting all of this information together, we can finally understand why your original call to wcstombs() was failing: it was reading past the end of the string and choking on invalid data (thereby triggering the EILSEQ error).
Now that we've elucidated the problem, it's time to talk about a possible solution. In order to do this The Right Way, the first thing you need to know is how big your output buffer needs to be. Luckily, there is one final tidbit in the documentation for wcstombs() that will help us out here:
If the mbstr argument is NULL, wcstombs returns the required size in bytes of the destination string.
So the idiomatic way to use the wcstombs() function is to call it twice: the first time to determine how big your output buffer needs to be, and the second time to actually do the conversion. The final thing to note is that as we stated previously, the wide character input string needs to be null-terminated for at least the first call to wcstombs().
Putting this all together, here is a snippet of code that does what you are trying to do:
size_t fileNameLengthInWChars = fileInfo.FileNameLength / sizeof(WCHAR); //get the length of the filename in characters
WCHAR *pwNullTerminatedFileName = new WCHAR[fileNameLengthInWChars + 1]; //allocate an intermediate buffer to hold a null-terminated version of fileInfo.FileName; +1 for null terminator
wcsncpy(pwNullTerminatedFileName, fileInfo.FileName, fileNameLengthInWChars); //copy the filename into a the intermediate buffer
pwNullTerminatedFileName[fileNameLengthInWChars] = L'\0'; //null terminate the new buffer
size_t fileNameLengthInChars = wcstombs(NULL, pwNullTerminatedFileName, 0); //first call to wcstombs() determines how long the output buffer needs to be
char *pFileName = new char[fileNameLengthInChars + 1]; //allocate the final output buffer; +1 to leave room for null terminator
wcstombs(pFileName, pwNullTerminatedFileName, fileNameLengthInChars + 1); //finally do the conversion!
Of course, don't forget to call delete[] pwNullTerminatedFileName and delete[] pFileName when you're done with them to clean up.
ONE LAST THING
After writing this answer, I reread your question a bit more closely and thought of another mistake you may be making. You say that wcstombs() fails after just converting the first two letters ("Ne"), which means that it's hitting uninitialized data in the input string after the first two wide characters. Did you happen to use the assignment operator to copy one FILE_NOTIFY_INFORMATION variable to another? For example,
FILE_NOTIFY_INFORMATION fileInfo = someOtherFileInfo;
If you did this, it would only copy the first two wide characters of someOtherFileInfo.FileName to fileInfo.FileName. In order to understand why this is the case, consider the declaration of the FILE_NOTIFY_INFORMATION structure:
typedef struct _FILE_NOTIFY_INFORMATION {
DWORD NextEntryOffset;
DWORD Action;
DWORD FileNameLength;
WCHAR FileName[1];
} FILE_NOTIFY_INFORMATION, *PFILE_NOTIFY_INFORMATION;
When the compiler generates code for the assignment operation, it does't understand the trickery that is being pulled with FileName being a variable length field, so it just copies sizeof(FILE_NOTIFY_INFORMATION) bytes from someOtherFileInfo to fileInfo. Since FileName is declared as an array of one WCHAR, you would think that only one character would be copied, but the compiler pads the struct to be an extra two bytes long (so that its length is an integer multiple of the size of an int), which is why a second WCHAR is copied as well.
My guess is that the wide string that you are passing is invalid or incorrectly defined.
How is pwFileName defined? It seems you have a FILE_NOTIFY_INFORMATION structure defined as fileInfo, so why are you not using fileInfo.FileName, as shown below?
wcstombs(pfileName, fileInfo.FileName, fileInfo.FileNameLength);
the error you get says it all, it found a character that it cannot convert to MB (cause it has no representation in MB), source:
If wcstombs encounters a wide character it cannot convert to a
multibyte character, it returns –1 cast to type size_t and sets errno
to EILSEQ
In cases like this you should avoid 'assumed' input, and give an actual test case that fails.