Unicode causes closing messagebox to terminate program - c++

I'm developing a Win32 API Wrapper. To make it Unicode-compliant, I do the following:
#ifndef UNICODE
#define gchar char
#define gstrcpy strcpy
#define gstrncpy strncpy
#define gstrlen strlen
#define gstrcat strcat
#define gstrncat strncat
#define gstrcmp strcmp
#define gstrtok strtok
#else
#define gchar wchar_t
#define gstrcpy lstrcpy
#define gstrncpy lstrncpy
#define gstrlen lstrlen
#define gstrcat lstrcat
#define gstrncat lstrncat
#define gstrcmp lstrcmp
#define gstrtok lstrtok
#endif
I also provide
#define uni(s) TEXT(s)
My test consisted of a window that creates a message box via
msg (uni("Left-click"));
whenever the user left-clicks the window. The problem is that, no matter how many messages are created, after 4 or 5 of these messages are closed when I #define UNICODE, the next message box shown, whether it be a new one or the one under the last one closed causes the program to return 0xC0000005. Not defining UNICODE will make this work perfectly. My msg function is as follows:
dword msg (cstr = uni(""), cstr = uni(""), hwin = null, dword = 0);
...
dword msg (cstr lpText, cstr lpCaption, hwin hWnd, dword uType)
{
return MessageBox (hWnd, lpText, lpCaption, uType);
}
where dword is DWORD, cstr is pchar, which is gchar *, which can be char * or wchar_t *, hwin is HWND, and null is 0.
It probably isn't the message box doing this, but I haven't done any other text-related stuff with the testing so I'll see if it crashes some other way too.
Does anyone know why this would happen? The difference between MB characters and unicode shouldn't cause the program to repeatedly crash. I can upload the headers and the test too, if needed.
Edit:
I just found out creating one message and then closing the actual window will result in the same crash. SOURCE CODE Here's the link for the source. Please keep in mind:
a) I only took one first-year programming course, ever (C++).
b) My wrapper's purpose is to make writing win32 apps as easy as possible.
c) I like to make things of my own (string class etc).
Also forgot this (duh), I'm using Code::Blocks (MinGW).
Edit:
I didn't realize before, but the program is trying to access memory at 0x00000000. This is what's causing the problem, but I have no idea why it would be trying to do this. I believe the instruction trying to access it is located somewhere in winnt.dll, but having never learned how to debug, I'm still trying to figure out how to find the information I need.
Edit:
Now, without changing it, but running it on a different computer, it's referencing 0x7a797877 instead of 0.
Edit: Changing the window procedure to include WM_LBUTTONDOWN and call msg() inside, rather than calling the added procedure makes the program work perfectly. Something with the way addmsg() and the window procedure are coded causes the _lpWindowName and _lpClassName to have corrupted data after a while, but non-pointer members are still preserved.
EDIT:
After all of this mayhem I finally found out I was missing a single character in all of my source code. When I defined msgparams as Window, UINT, WPARAM, LPARAM and likewise with msgfillparams (except with names) I forgot to pass a reference. I was passing the Window by value! I'd still like to thank everyone who posted, as I did get my butt kicked debugger-wise and ended up learning a lot more about Unicode as well.

you should do your homework before asking questions on SO. My impression is that you have almost no idea about how Unicode works on Windows and it will require many pages to explain.
Porting an application from ANSI to Unicode is a big deal on Windows. It may seem reasonable to pay someone with experience do to this.
Mainly everything that worked with char will have to work with wchar_t.
The entire API has other functions but you should start by using windows support for this, not writing your own macros and first step is to use _T not W so you'll b able to start changing code and still be able to compile in both Unicode and ANSI.

Why are you even bothering with ANSI in the first place? All the TCHAR support dates back to a time when Win95 was commonplace, so developers had to write code that could compile as ANSI (for Win95) or UNICODE (for NT-based Windows). Now that Win95 is long obsolete, there's no need to bother with TCHAR: just go all-UNICODE, using L"Unicode strings" instead of TEXT() and wcs-versions of the CRT rather than the _t-versions.
Having said that, here's some common sources of errors with ANSI/UNICODE code that could explain some of what you are seeing:
One possibility is that there's a bug somewhere that's corrupting the stack - uninitialized variable, stack overrun, and the like. In unicode, any chars or strings on the stack may take up a different amount of space compared to the ANSI version, so variables will end up in different places relative to one another. Chances are you are 'getting lucky' in the ANSI build, and whatever is being corrupted isn't important data; but on the UNICODE build, something important on the stack is getting nuked. (For example, if you overflow a buffer on the stack, you could end up overwriting the return address that's also on the stack, likely causing a crash at the next function return.)
--
Watch out for cases where you are mixing up character counts versus byte counts: with ANSI, you can use 'sizeof()' almost interchangeably with a character count (depending on whether you're counting the terminating NUL space or not); but with UNICODE, you can't: and if you get them mixed up, you can get a buffer overrun very easily.
For example:
// Incorrectly passing byte count instead of character count
WCHAR szWindowName[32];
GetWindowTextW( hwnd, szWindowName, sizeof(szWindowName) );
this can cause a buffer overrun (leading to crash - if you're lucky - or silently corrupted data and incorrect results later on if you're not lucky) since it's passing 64 - the size in bytes - to GetWindowText, instead of 32, the size in characters.
On windows, Use the ARRAYSIZE(...) instead of sizeof() to get the number of elements in an array rather than the byte-size of the array.
--
Another thing to watch for is any strings where you have used casts to "force" them into CHAR or WCHAR to avoid compiler errors: eg.
// Incorrectly calling ANSI function with UNICODE strings...
MessageBoxA(hwnd, (LPCSTR)L"Unicode Title", (LPCSTR)"Unicode content", MB_OK);
This type of usage typically results in just the first character of the string showing.
// Incorrectly calling UNICODE function with ANSI strings...
MessageBoxW(hwnd, (LPCWSTR)"ANSI Title", (LPCWSTR)"ANSI content", MB_OK);
This is trickier, you may get a string of garbage, or could get an error of some kind.
These cases are easy to spot there there are casts - generally speaking, casts should be viewed as a 'red flags' and avoided at all costs. Don't use them to avoid a compiler error, instead fix the issue that the compiler is warning about.
Also watch out for cases where you can get these mixed up but where the compiler won't warn you - eg with printf, scanf and friends: the compiler doesn't check the argument lists:
// Incorrectly calling ANSI function with UNICODE string - compiler won't warn you here...
LPCWSTR pString = L"I'm unicode!";
printf("The result is: %s\n", pString);

Related

Using, StringCchCat

I'm trying to use the StringCchCat function:
HRESULT X;
LPWSTR _strOutput = new wchar_t[100];
LPCWSTR Y =L"Sample Text";
X = StringCchCat(_strOutput, 100, Y);
But for some reason I keep getting the "E_INVALIDARG One or more arguments are invalid." error from X. _strOutput Is also full of some random characters.
This is actually part of a bigger program. So what I'm trying to do is to concatenated the "sample text" to the empty _strOutput variable. This is inside a loop so it is going to happen multiple times. For this particular example it will be as if I'm assigning the Text "Sample Text" to _strrOutput.
Any Ideas?
If it's part of a loop, a simple *_strOutput = 0; will fix your issue.
If you're instead trying to copy a string, not concatenate it, there's a special function that does this for you: StringCchCopy.
Edit: As an aside, if you're using the TCHAR version of the API (and you are), you should declare your strings as TCHAR arrays (ie LPTSTR instead of LPWSTR, and _T("") instead of L""). This would keep your code at least mildly portable.
String copy/concat functions look for null terminators to know where to copy/concat to. You need to initialize the first element of _strOutput to zero so the buffer is null terminated, then you can copy/concat values to it as needed:
LPWSTR _strOutput = new wchar_t[100];
_strOutput[0] = L'\0`; // <-- add this
X = StringCchCat(_strOutput, 100, Y);
I'm writing this answer to notify you (so you see the red 1 at the top of any Stack Overflow page) because you had the same bug yesterday (in your message box) and I now realize I neglected to say this in my answer yesterday.
Keep in mind that the new[] operator on a built-in type like WCHAR or int does NOT initialize the data at all. The memory you get will have whatever garbage was there before the call to new[], whatever that is. The same happens if you say WCHAR x[100]; as a local variable. You must be careful to initialize data before using it. Compilers are usually good at warning you about this. (I believe C++ objects have their constructors called for each element, so that won't give you an error... unless you forget to initialize something in the class, of course. It's been a while.)
In many cases you'll want everything to be zeroes. The '\0'/L'\0' character is also a zero. The Windows API has a function ZeroMemory() that's a shortcut for filling memory with zeroes:
ZeroMemory(array, size of array in bytes)
So to initialize a WCHAR str[100] you can say
ZeoMemory(str, 100 * sizeof (WCHAR))
where the sizeof (WCHAR) turns 100 WCHARs into its equivalent byte count.
As the other answers say, simply setting the first character of a string to zero will be sufficient for a string. Your choice.
Also just to make sure: have you read the other answers to your other question? They are more geared toward the task you were trying to do (and I'm not at all knowledgeable on the process APIs; I just checked the docs for my answer).

Have a PCWSTR and need it to be a WCHAR[]

I am re-writing a C++ method from some code I downloaded. The method originally took a PCWSTR as a parameter and then prompted the user to enter a file name. I modified the method to take two parameters (both PCWSTR) and not to prompt the user. I am already generating the list of files somewhere else. I am attempting to call my new (modified) method with both parameters from my method that iterates the list of files.
The original method prompted the user for input using a StringCBGetsW command. Like this...
HRESULT tst=S_OK; //these are at the top of the method
WCHAR fname[85] = {0}; //these are at the top of the method
tst = StringCbGetsW(fname,sizeof(fname));
The wchar fname gets passed to another iteration method further down. When I look at that method, it says it's a LPCWSTR type; I'm assuming it can take the WCHAR instead.
But what it can't do is take the PCWSTR that the method got handed. My ultimate goal is to try not prompt the user for the file name and to take instead the filename that was iterated earlier in another method.
tl;dr. I have a PCWSTR and it needs to get converted to a WCHAR. I don't know what a WCHAR [] is or how to do anything with it. Including to try to do a printf to see what it is.
PS...I know there are easier ways to move and copy around files, there is a reason I'm attempting to make this work using a program.
First, let's try to make some clarity on some Windows specific types.
WCHAR is a typedef for wchar_t.
On Windows with Microsoft Visual C++, it's a 16-bit character type (that can be used for Unicode UTF-16 strings).
PCWSTR and LPCWSTR are two different names for the same thing: they are basically typedefs for const wchar_t*.
The initial L in LPCWSTR is some legacy prefix that, read with the following P, stands for "long pointer". I've never programmed Windows in the 16-bit era (I started with Windows 95 and Win32), but my understanding is that in 16-bit Windows there were something like near pointers and far, or long pointers. Now we have just one type of pointers, so the L prefix can be omitted.
The P stands for "pointer".
The C stands for "constant".
The W stands for WCHAR/wchar_t, and last but not least, the STR part stands for "string".
So, decoding this kind of "Hungarian Notation", PCWSTR means const wchar_t*.
Basically, it's a pointer to a read-only NUL-terminated wchar_t Unicode UTF-16 string.
Is this information enough for you to solve your problem?
If you have a wchar_t string buffer, and a function that expects a PCWSTR, you can just pass the name of the buffer (corresponding the the address of its first character) to the function:
WCHAR buffer[100];
DoSomething(buffer, ...); // DoSomething(PCWSTR ....)
Sometimes - typically for output string parameters - you may also want to specify the size (i.e. "capacity") of the destination string buffer.
If this size is expressed using a count in characters (in this case, in wchar_ts), the the usual Win32 Hungarian Notation is cch ("count of characters"); else, if you want the size expressed in bytes, then the usual prefix is cb ("count of bytes").
So, if you have a function like StringCchCopy(), then from the Cch part you know the size is expressed in characters (wchar_ts).
Note that you can use _countof() to get the size of a buffer in wchar_ts.
e.g. in the above code snippet, _countof(buffer) == 100, since buffer is made by 100 wchar_ts; instead, sizeof(buffer) == 200, since each wchar_t is 2 bytes == 16 bits in size, so the total buffer size in bytes is 100 [wchar_t] * 2 [bytes/wchar_t] = 200 [bytes].

using wsprintf with malloc

Is there any mistake to write a code such:
char* sp=(char*)malloc(128);
int x=22;
wsprintf(sp,"%d",x);
cout<<sp;
I am asking specially about security mistakes?
There are a number of "potential" issues here, non of them is actually infinging anything but you may find things not behaving as you expect.
First: wsprintf, as a Win32 API (http://msdn.microsoft.com/en-us/library/windows/desktop/ms647550(v=vs.85).aspx ) is prototyped as:
int __cdecl wsprintf(
_Out_ LPTSTR lpOut,
_In_ LPCTSTR lpFmt,
_In_ ...
);
where LPTSTR is defined as char* or wchar_t* depending on the definition or not of the UNICODE symbol (check your propject settings and / or build commands)
Now, in case you are on an ANSI build (no UNICODE) all types are coherent, but there is no check about wsprintf writing more than the 128 char you allocated. If you just write a decimal integer it will have no problem, but if you (of somebody else after you) modify later the "message" and no checks are made, some surprises may arise (like wsprintf(sp,"This is the number I've been told I was supposed to be expected to be: %d",x); will this still fits the 128 chars?!? )
In case you are on a UNICODE build, you allocate 128 char, and write a double-byte string on it. The number 22 will be written as \x32\x00\x32\x00\x00\x00 (3200 is the little-endian coding for 0x0032 that is the wchar_t correponding to the UNICODE 50 that stands for '2').
If you give that sequence to cout (that is char based, not wchar_t based) will see the first \x00 as a string terminator and will output ... just '2'.
To be coherent, you shold either:
use all char based types and function (OK malloc and cout, but wsprintfA instead of wsprintf)
use all wchar_t based types and function (malloc(128*sizeof(wchar_t)), wchar_t* and wsprintfW)
use all TCHAR based types (malloc(128*sizeof(TCHAR)), TCHAR* and wsprintf, but define tcout as cout or wcout depending on UNICODE).
There is no security mistake because it is never the case that an int converted to a C string will exceed the size you've allocated.
However this style of programming has the potential for security issues. And history has shown that this kind of code has caused real security issues time and time again. So maybe you should be learning a better style of coding?
This MSDN link lists some concerns over the use of wsprintf. They don't appear to apply to your example but they do give some alternatives that you might want to explore.
Ok given that you have stated you are using winapi, read this from their documentation:
Note Do not use. Consider using one of the following functions
instead: StringCbPrintf, StringCbPrintfEx, StringCchPrintf, or
StringCchPrintfEx. See Security Considerations.
Therefore do not use. I would ignore however what they tell you to do instead and either:
Write in C, and a function from the C standard library (sprintf, snprintf etc). In that case you cannot use cout.
Write in C++. Use std::string and there is even a new to_string, as well as boost::format and ostringstream to help you build formatted string. You can still use C standard library functions there if you want to as well when it really suits your purpose better, but leave the allocation stuff to the library.

Setting program to be autorun through registry

I have the following code:
http://privatepaste.com/8364a2a7b8/12345
But it only writes "c" (supposedly, conversion to LPBYTE leaves one byte only).
What's the proper way to handle GetModuleFileName and registry edit?
strlen((char*)szPath2)+1
This is most likely where your problem is. I bet your program is compiled in UNICODE mode. strlen only works properly for ASCII strings. (The fact that you're having to cast from TCHAR to char is a big hint that something isn't right.)
To keep consistent with the usage of TCHAR and such, you should probably use _tcslen instead.

Does SHGetPathFromIDList() (and similar) put a terminating 0 in its argument?

This is actually a question about a huge number of winapi functions.
A typical MS documentation says (from http://msdn.microsoft.com/en-us/library/bb762194(VS.85).aspx ):
BOOL SHGetPathFromIDList(
PCIDLIST_ABSOLUTE pidl,
LPTSTR pszPath
);
pidl [in] The address of an item identifier list that specifies a file
or directory location relative to the root of the namespace (the desktop).
pszPath [out] The address of a buffer to receive the file system path.
This buffer must be at least MAX_PATH characters in size.
Nowhere does it say about whether a terminating 0 is written to pszPath. Also, it doesn't say whether the path can fill the pszPath, leaving no room for 0 there.
Googling around yeidls about 50/50 distribution of users who allocate a buffer with MAX_PATH+1 chars and users who only deal with MAX_PATH.
While I can certainly do something like char buf[MAX_PATH+1]={0} to be on the safe side, I would really like to know - is there some place where this stuff is described? Some page for all path-related functions maybe, I don't know...
It says "This buffer must be at least MAX_PATH characters in size" for pszPath parameter so MAX_PATH buffer size should be always enough. Also I believe that all Win32 functions dealing with LPCTSTR / LPTSTR parameters expect or return null-terminated strings.
To answer the title question: Yes. It's part of the definition of LPTSTR - a pointer to a string. It is also reflected in the prefix: psz - "Pointer (to) String (terminated by) Zero".
There is a non-null-terminated stringtype as well, but it's rare in userland API's: UNICODE_STRING. You see it mostly in kernel-level APIs
I don't know how this function (or the others) actually behaves, but I'd recommend writing a few unit tests against this function... What happens when you don't use all of the buffer? What happens if you do? etc. Not only will these document your assumptions, but if the function ever changes how it behaves, you'll get a warning from your unit tests instead of experiencing a nasty bug report coming in.